Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Mar 1.
Published in final edited form as: Regul Toxicol Pharmacol. 2020 Dec 17;120:104843. doi: 10.1016/j.yrtph.2020.104843

A cross-industry collaboration to assess if acute oral toxicity (Q)SAR models are fit-for-purpose for GHS classification and labelling

Joel Bercu a, Melisa J Masuda-Herrera a, Alejandra Trejo-Martin a, Catrin Hasselgren b, Jean Lord b, Jessica Graham c, Matthew Schmitz d, Lawrence Milchak e, Colin Owens e, Surya Hari Lal f, Richard Marchese Robinson f, Sarah Whalley f, Phillip Bellion g, Anna Vuorinen g, Kamila Gromek h, William A Hawkins i, Iris van de Gevel j, Kathleen Vriens j, Raymond Kemper k, Russell Naven k, Pierre Ferrer l, Glenn J Myatt m,2
PMCID: PMC8005249  NIHMSID: NIHMS1661918  PMID: 33340644

Abstract

This study assesses whether currently available acute oral toxicity (AOT) in silico models, provided by the widely employed Leadscope software, are fit-for-purpose for categorization and labelling of chemicals. As part of this study, a large data set of proprietary and marketed compounds from multiple companies (pharmaceutical, plant protection products, and other chemical industries) was assembled to assess the models’ performance. The absolute percentage of correct or more conservative predictions, based on a comparison of experimental and predicted GHS categories, was approximately 95%, after excluding a small percentage of inconclusive (indeterminate or out of domain) predictions. Since the frequency distribution across the experimental categories is skewed towards low toxicity chemicals, a balanced assessment was also performed. Across all compounds which could be assigned to a well-defined experimental category, the average percentage of correct or more conservative predictions was around 80%. These results indicate the potential for reliable and broad application of these models across different industrial sectors. This manuscript describes the evaluation of these models, highlights the importance of an expert review, and provides guidance on the use of AOT models to fulfill testing requirements, GHS classification/labelling, and transportation needs.

Introduction

The purpose of the acute oral toxicity (AOT) study is to characterize general degrees of toxicity and understand the potential for a compound to cause life-threating effects from an acute exposure. Regulatory authorities often require the AOT testing of substances in order to characterize their toxicity and assign hazard categories, which informs the labelling of products to indicate appropriate restrictions and precautions to be taken during their handling, transportation, or use (Hamm et al., 2017). While the exact requirements for the content and formatting of labelling may vary by the product type, regulatory agency, and use context, there have been numerous international efforts to harmonize hazard identification, and classification and labelling over the last several decades (Strickland et al., 2018). Examples of frameworks include the United Nations (UN) Recommendations on the Transport of Dangerous Goods and the Globally Harmonized System (GHS) of Classification and Labelling of Chemicals (UN 2019a; UN 2019b). Each framework is regularly revised and updated to reflect national, regional and international experiences in implementing their requirements into laws, as well as the experiences of users who perform the classification and labelling (UN (2019a)).

AOT studies are required for the majority of compounds as part of the European Union’s (EU’s) legislation on the registration, evaluation, authorization and restriction of chemicals (REACH) produced at ≥ 1 tons per year and manufactured or imported in the EU or European Economic Area (EEA) (EU 2006; ECHA 2015) as well as other international compound registrations. AOT information is also utilized to define labeling information for safety data sheets (SDS) and containers as defined by the UN’s GHS for classification and labelling of chemicals (i.e., the purple book, EU’s Classification, Labelling and Packaging (CLP)) (UN GHS 2005; EU 2017). Finally, AOT information guides how a chemical should be packaged, labeled and, or transported (49 CFR, Part 178; 16 CFR 1500.3; IATA 2020). The well-established practice and widespread use of AOT studies for these intended purposes, as well as an overall lack of non-animal alternatives, results in the mandated necessity to continue to conduct these tests.

The median lethal dose, LD50, is a general indicator of a chemical substance’s acute systemic toxicity. The LD50 values from acute toxicity tests in rodents serve as the basis for the toxicological classification. The most commonly performed tests for acute toxicity are described in the OECD guidelines (OECD 2008) and are essentially identical to those called for under the Toxic Substances Control Act (TSCA) (TSCA 2016), Federal Insecticide, Fungicide, and Rodenticide Act (FIFRA) (FIFRA 1996), and REACH regulations. The AOT, tests including the limit test, fixed-dose procedure, toxic class method, and up-and-down methods (OECD 2002a, OECD 2002b, OECD 2008, respectively) each represent a more simplified study design compared to the original animal test method (OECD 401, which was deleted in 2002) as a means of minimizing animal use.

GHS provides an internationally compatible system to classify and communicate physical, health, and environmental hazards of a substance for the protection of humans and the environment. Several toxicological endpoints are presented in the GHS regulation to enable proper hazard classification, including acute toxicity by the oral, dermal, and/or inhalation (gases, vapors, dusts & mists) route. There are five GHS categories for acute toxicity (Category 1-5), which are banded based on the dose or concentration required to produce a severe toxic effect or death in 50% of the exposed population (i.e., LD50), with Category 1 chemicals being the most toxic (see Table 1). These five acute toxicity classification categories have corresponding pictograms, signal words, and hazard statements, which are used for hazard communication on safety data sheets and chemical labels (UN GHS 2005). It should be noted that not all classification categories are adopted in all regions in the world. Regulation (EC) 1272/2008 on classification, labelling and packaging of substances and mixtures (CLP Regulation, EU 2008) has adopted Categories 1-4, whereas category 5 substances, with a low toxicity, are designated as “not classified” according to the CLP regulation.

Table 1:

GHS Classification Criteria for AOT

Acute
Toxicity
Category
1
Category 2 Category 3 Category 4 Category 5 Not classified
(NC)
Oral (mg/kg) LD50 ≤ 5 5 < LD50 ≤ 50 50 < LD50 ≤ 300 300 < LD50 ≤ 2000 2000 < LD50 ≤ 5000 5000 < LD50

There is a balance in toxicology research for understanding the hazards of chemicals versus the need for animal testing. (OECD 2001) The “3Rs” is a global initiative geared toward reducing animal use in research and stands for (1) Replacing animal-dependent study methods with reliable/comparable alternative methods, (2) Reducing the number of animals in a study, and/or (3) Refining studies to improve animal welfare (Russel and Burch, 1959). Industry implements the 3Rs to accelerate scientific discovery, support innovation and technological developments, and address societal concerns about animal research. There are ongoing national and international efforts to employ the 3Rs across toxicology testing and gain regulatory endorsement (NC3Rs (2020); EFPIA (2019); AnimalResearch.Info (2018); Lautenberg Chemical Safety Act (2016); Tox21 (2008)). Additionally, the EU Directive 2010/63/EU mandated the application of reduction, refinement and replacement across the EU (EU Directive 2010/63/EU).

There have been efforts to reduce the number of laboratory animals needed for the existing in vivo methodologies utilized for determining the AOT of compounds. The new OECD guidelines for AOT studies reduced the number of animals needed to define a point estimate while also enabling a more harmonized approach to classifying compounds based on their AOT hazard (UN GHS 2005). Introduction of a limit dose (2,000 mg/kg) and a maximum tested dose (5,000 mg/kg) to define “not acutely toxic”, also reduced the number of animals required for compounds of low toxicity as there was no need for excessive dosing (OECD 2002a, OECD 2002b, OECD 2008; UN GHS 2005). The approval of the Fixed Dose Procedure (OECD TG 420), Acute Toxic Class (OECD TG 423) and Up and down procedure (OECD TG 425) were also considerable advances as historical studies utilized ~100 animals per study and these newer test guidelines utilize 2-15 animals per study (Erhirhie et al., 2017). In addition, the fixed dose procedure relies on clear signs of toxicity at fixed dose levels versus lethality, which reduces animals and offers a refinement that improves animal welfare (OECD 2002a).

At the time of preparing this paper, there are no validated (e.g. OECD test guidelines), internationally accepted, animal-free alternatives to the acute oral toxicity animal study that regulatory bodies accept. Based on their common use in cytotoxicity assessments, the 3T3 (mouse fibroblasts) neutral red uptake (NRU) and the NHK (human keratinocytes) NRU in vitro methods have been evaluated as potential alternatives to AOT testing (Creton et al., 2010; Schrage et al., 2011; OECD 2010). However, these methods were found to not be sufficiently accurate as stand-alone test methods but recommended to be incorporated as part of a weight of evidence approach for the selection of starting doses for rodent AOT tests (Creton et al., 2010). (Quantitative) structure activity relationship – or (Q)SAR – models have also not been sufficiently developed or validated to enable them to be used as stand-alone alternatives to animal testing or to classify and waive/not test in the case of REACH. However, (Q)SAR information can be used to supplement experimental test data as part of a weight of evidence or an Intelligent Testing Strategy (ITS) approach (ECHA, 2008; Creton et al., 2010).

AOT in silico model development is aligned with the 3Rs mission to replace existing methods that require laboratory animals. An AOT in silico model offers an animal-free way to elucidate a compound’s acute hazards to fulfill testing requirements, classification/labelling, or transportation purposes. Fundamental to the success of a global AOT in silico model is a sufficiently representative, large and high-quality database and algorithms which have the capability to make reliable predictions for a broad range of chemical structures. (In the case of a statistical QSAR, the model itself would be derived from the database using an algorithm, but the manner in which any (Q)SAR makes predictions of chemical hazard may be considered an algorithm, with data not seen during the model development procedure required for external validation of the final model.) A reliable AOT in silico model could complement an existing laboratory study to further reduce animals or refine existing procedures. For example, an in silico AOT model can assist in predicting the starting dose for the OECD 420 AOT test (the only AOT test with a non-lethal endpoint), enabling the minimum number of animals to be used and avoid lethality. Another example is if the LD50 is predicted to be >2000 mg/kg, the limit dose can be utilized as the starting dose with greater confidence, eliminating the need for lower doses to be tested and reducing the number of animals used. In addition to use in regulatory requirements, classification and labelling, and transportation needs, a reliable AOT in silico tool has potential utility in early stages of research and development as an alternative to in vivo testing for assessing the likelihood of acute oral toxicity for a given chemical series to guide subsequent testing strategies and compound design.

If an alternative model predicts AOT as reliably as an in vivo study, the alternative method should be preferred and supported. When evaluating an alternative method, it should also be understood that the in vivo AOT test itself has a variable response (Pham et al., 2020) Variability, i.e. differences in the GHS class observed for the same chemical, has been observed in animal studies with 18%-25% of studies (depending on the route of exposure) on the same compound resulting in a different GHS category (Allen et al., 2019) and even more-so (25-27% variability; Karmaus 2018) in test sets currently under investigation as alternatives to the AOT test. In silico models should not show variability for the same compound, but their accuracy or apparent accuracy will necessarily be limited by the variability in the experimental data used for training and/or testing. Still, if experimental endpoint values used for training or testing were derived from multiple test results per chemical, the variability in the endpoint data could be reduced from the variability in single test results, potentially allowing in silico predictions to be more reliable than individual test results, but not more reliable than the endpoint values seen during training. Therefore, it is expected that there will be an acceptable limit on the accuracy of in silico predictions as has been observed with AOT responses in animal studies.

(Q)SAR4 in silico models are increasingly being considered to predict specific toxicological endpoints, such as LD50, based on the chemical structure alone. (Lapenna et al., 2010; Drwal et al., 2014; NASEM 2015; Kleinstreuer et al., 2018) The purpose of this paper is to explore the use of in silico models to advance the 3Rs for AOT. This paper will assess in silico models against chemicals such as pharmaceuticals, pharmaceutical intermediates, plant protection products, plant protection product intermediates, metabolites, and starting materials, along with specialty chemicals submitted by manufacturers to determine their performance compared to animal models. The results will guide the use and application of in silico models within the framework of existing regulations such as REACH, GHS, and transportation. Specifically, the following paper outlines a cross-industry collaboration where each organization collected historical AOT experimental data and ran AOT models over these chemicals. Each collaborator shared the experimental and predicted results and an analysis of all results was performed to understand the AOT model’s performance across different methodologies, across different chemical sectors and of the consensus results. In addition, an expert review of experimentally classified category 1 and category 2 results was performed to understand how such a review would support the overall workflow.

Methodology

(Q)SAR Models

There are two commonly used (Q)SAR methodologies referred to as expert rule-based and statistical-based (Myatt et al., 2017). Leadscope (an Instem company) has recently developed and made available a first generation of (Q)SAR models covering both methodologies to predict GHS categories for rat acute oral toxicity (Leadscope 2020). Both methodologies use a database of over 15,000 chemicals with rat AOT results from a number of sources including the Registry of Toxic Effects in Chemical Substances, ECHA, EU’s Joint Research Council’s AcutoxBase, National Library Medicines (NLM) Hazardous Substances Data Bank, OECD (eChemPortal), PAI (NICEATM) and TEST (NLM ChemIDplus) (RTECS 2011; Kleinstreuer et al., 2018).

A series of individual models have been developed from this combined dataset and used to predict GHS categories (1 to 5 and NC). These individual statistical models or sets of expert alerts predict whether a chemical is below a specified LD50 threshold corresponding to the GHS cut-off values. The statistical-based models use a Partial Logistic Regression algorithm that incorporates structural features and calculated physico-chemical properties. Whilst the models have undergone subsequent development, the models build upon the approach previously reported in the literature (Yang 2005). For the expert rule-based models, a set of 2,867 structural alerts were encoded that will predict whether a chemical is below a specified GHS threshold. These models are then used within a decision tree to compute a GHS category. (Myatt et al., 2019)

This decision tree approach is outlined in Figure 1 where for each individual methodology a GHS category is predicted, as well as an overall GHS category prediction derived from the individual methodologies. In Figure 4, a chemical is predicted to be GHS category 3 using the expert rule-based approach and GHS category 4 using the statistical-based methodology. For the expert rule-based method, a set of alerts predicts whether the chemical’s LD50 is below the 5 mg/kg threshold. Since it was not predicted to be below this threshold, a second alert set is used to determine whether the chemical is below the 50 mg/kg threshold. Again, the prediction was negative; however, a third set of alerts predicted the chemical was below the 300 mg/kg threshold. Therefore, it was predicted to be between 50 – 300 mg/kg and hence assigned to GHS category 3. A similar process was performed using a series of statistical-based models as shown in Figure 1. In this case, the overall prediction was category 4 (LD50 in the range of 300-2,000 mg/kg). The most conservative value (GHS category 3) was used as the final consensus model from the two methodologies.

Figure 1: Illustration of how a prediction, based on two methodologies, are computed.

Figure 1:

Note: the red color coding in the expert rule-based results illustrate where any alert(s) match the test chemical; the color coding in the statistical-based (QSAR) models reflects the weighting of the features used in the model with red indicating positive association, blue/green negative association and gray showing a lack of clear positive or negative associations

Figure 4:

Figure 4:

Number of chemicals for each experimental in vivo GHS category

The models allow for inspection of the underlying model information, such as feature weightings, to support an expert review. In addition, it is possible to review analogs in the database to provide additional supportive evidence, as shown in Figure 2.

Figure 2: Analogs of the test chemical with known GHS categories derived from in vivo data.

Figure 2:

Note: Analogs were determined based on a Tanimoto similar score using Leadscope’s pre-defined structural features

Running (Q)SAR models and assembling datasets

Collaborators were given access to the acute toxicity (Q)SAR models from Leadscope (Leadscope acute rat oral QSAR (v1) and alerts (v1) [System: Leadscope Model Applier v2.4]) to use in this exercise. Each collaborator collected historical information on chemicals where a rat AOT had been performed, with a Klimisch score of 1 or 2 (Klimisch et al., 1997) where possible, along with information on the study protocol, study parameters and results (for the chemicals from the plant protection product sector, 24% of compounds were retrieved from the Pesticide Properties Database (Lewis et al., 2016)). In some cases, a GHS category was derived and in other cases an LD50 value or range was identified. The chemicals were then loaded into the (Q)SAR software and prediction results were generated. The software calculated one of the following 8 values for each test chemical: Category 1, Category 2, Category 3, Category 4, Category 5, Not Classified (NC), Out-of-Domain, or Indeterminate. The software may generate an out-of-domain result where a chemical is sufficiently different from the training set examples to make a reliable prediction or where the model’s features do not overlap with features in the test chemical. The software may also generate an indeterminate prediction where there is conflicting information, such as where the influence of substituents around a chemical class is not fully understood. Any chemical where it was determined to be part of the training set was removed. This information was then transferred to Excel spreadsheets along with relevant supporting information on the studies. To avoid sharing any potentially confidential information on the individual chemicals, all information that could provide any chemical identification was removed. However, a reference identifier was requested for each chemical in case questions needed to be resolved later.

Curating and combining the results

Each collaborator shared their in vivo results, as shown in Figure 3. Initially, the individual results were analyzed to remove entries that could not be used in this exercise, based on the following rules:

Figure 3:

Figure 3:

Combining the results from multiple companies

  • When an in vivo LD50 range was provided that spans multiple GHS categories (except for >2,000 mg/kg since the 5,000 mg/kg dose is often only used when it can be justified)

  • In cases where it was possible to identify whether a chemical was present in the underlying model’s database from the software output

In some cases, the individual collaborators provided both LD50 and GHS category results, in others only LD50 values or ranges were provided. The following rules were adopted to consistently process the data:

  • When only LD50 values were provided, a GHS category corresponding to the LD50 value or range was computed

  • When both an LD50 and GHS category were provided then the GHS category was used when justified by the collaborator

  • When an experimental value of >2000 mg/kg was used, a “Category 5 or Not Classified” entry was used

Generating summary statistics

The results were consolidated (as shown in Figure 3), and a series of summary statistics were generated for the entire dataset as well as subsets including collections from the pharmaceutical industry, plant protection product industry and other chemical industries. These summary statistics use an assessment of whether the experimental in vivo GHS category exactly matched the predicted GHS category. In cases where the experimental category was assigned to the category “Category 5 or Not Classified”, a correct match was recorded if the prediction was Category 5 or Not Classified.

A series of summary statistics were calculated to support an assessment of whether the (Q)SAR test is fit-for-purpose for classification and labeling, that is it predicts either the correct or a more potent category. This analysis was performed on both the entire data set as well as subsets of the data as explained below.

  1. The proportion of compounds correctly or more conservatively classified (for example, if the in vivo GHS category was 3, then a prediction of GHS 1, 2 or 3 would be a match)

  2. The proportion of compounds correctly predicted or one category more conservative (for example, if the in vivo GHS category was 3, then a prediction of GHS 2 or 3 would be a match)

Two additional summary statistics were computed to assess the accuracy of the models.

  1. The proportion of compounds correctly predicted (for example, if the in vivo GHS category was 3, then only a prediction of GHS 3 would be a match)

  2. The proportion of compounds correctly predicted or one category higher/lower (for example, if the in vivo GHS category was 3, then a prediction of GHS 2, 3 or 4 would be a match)

For each of these statistics, an overall assessment (i.e., the proportion across all test compounds) as well as a balanced assessment (based on the average proportion for each experimental in vivo GHS category) was calculated. Whilst the values derived from the overall assessment are more intuitive, the fact that the dataset was skewed towards a higher proportion of low toxicity chemicals (see below) makes the latter values more appropriate to consider.

In addition, a baseline was computed using a random model (i.e., a random uniformly distributed assignment to category 1 through 5 and not classified) and the same balanced summary statistics generated. This was used for comparison purposes.

Expert review

An additional manual assessment of experimentally determined category 1 or 2 chemicals that were predicted by the (Q)SAR models to be in a less potent category was performed. This assessment used both information generated by the software (e.g., analogs, feature weightings) and any other information that would have been generated, including any in vitro assay results indicating a chemical’s mechanism/mode of action (MoA). The analysis was then revised based on any modified results from this expert review.

Results

Results were provided from 3M, Abbvie, Bristol Myers Squibb (BMS), DSM, Genentech, Gilead Sciences, GlaxoSmithKline (GSK), Johnson and Johnson (J&J), Syngenta and Vertex. Information on 2,568 chemicals was provided and, after processing the results, 2,290 chemicals were used in the analysis. Given that the identities of the chemicals were not shared, it is not possible to determine whether any of the chemicals provided were duplicates; however, since these chemicals represent proprietary lead compounds, candidate active ingredients, intermediates, etc. from different companies, as well as additional marketed plant protection products and metabolites from a single database (Lewis et al., 2016), we can reasonably assume there is limited overlap because of the diverse proprietary chemical space being assessed. Any chemical where it was determined to be part of the training set was removed. Figure 4 visually shows the number of chemicals in each of the experimental in vivo GHS categories. As previously noted, a category “Cat. 5 or NC” was created for chemicals where the experimental LD50 result was specified as > 2,000 mg/kg.

A summary of how the Leadscope consensus model predicted the experimental in vivo GHS categories is shown in Table 2. The seven experimental categories used in this analysis are listed vertically along with the six predicted categories (cat. 1-5 and NC), shown horizontally. Counts of the number of chemicals are shown in the table. To illustrate, there were 8 chemicals that had experimental in vivo values placing them in category 1. Five of these 8 were predicted by the consensus model as category 1, 2 were predicted as category 2 and the remaining 1 was predicted as category 5. The total value of 2,181 results is less than the 2,290 chemicals analyzed since 109 predictions were inconclusive (approximately 5% were either out-of-domain or indeterminate predictions). From this table, it can be seen that 95% of chemicals were either correctly predicted or were assigned to a more conservative category. However, the skewed nature of this dataset, i.e. the higher percentage of low toxic compounds, means that a balanced assessment was also required (see below).

Table 2:

Table showing counts of how the consensus model predicts for the different GHS categories

Predictedc
Experimental Cat. 1 Cat. 2 Cat. 3 Cat. 4 Cat. 5 NC Total
Cat. 1 5a 2 0 0 1 0 8
Cat. 2 5 18a 5 2 2 1 33
Cat. 3 1 29 52a 40 2 2 126
Cat. 4 4 43 115 260a 38 8 468
Cat. 5 1 15 54 106 59a 12 247
Cat. 5 or NCb 3 48 164 343 128a 23a 709
NC 9 32 119 227 116 87a 590
Total 28 187 509 978 346 133 2,181
a:

Indicates where a correct prediction is made

b:

Where chemicals were identified as > 2,000 mg/kg they were place in category “Cat. 5 or NC” and not in Cat.5 or NC

c:

Not including inconclusive predictions

An assessment of the performance of the consensus model for each experimental in vivo GHS category is shown in Table 3. Two summary statistics that help to understand whether the model is fit-for-purpose for classification and labelling are presented: (1) the percentage of correctly predicted chemicals or chemicals predicted to be in a more conservative GHS category and (2) the percentage of correctly predicted chemicals or chemicals predicted in an adjacent more conservative category. Two additional summary statistics were calculated to help understand the accuracy of the model: (1) the percentage of correctly predicted chemicals and (2) the percentage of correctly predicted chemicals or chemicals predicted in an adjacent category. The inconclusive results were not used in calculating the summary statistics.

Table 3:

Breakdown of the results across different categories

Fit-for-purposeb Accuracyc
Experimental
value
Count Number of
inconclusive
predictionsa
Percentage
correct or
more
conservative
Percentage
correct or one
more
conservative
Percentage
correct
Percentage
correct (+/− one
category)
Cat. 1 8 0 62.5% 62.5% 62.5% 87.5%
Cat. 2 34 1 69.7% 69.7% 54.6% 84.9%
Cat. 3 128 2 65.1% 64.3% 41.3% 96.0%
Cat. 4 482 14 90.2% 80.1% 55.6% 88.3%
Cat. 5 259 12 95.1% 66.8% 23.9% 71.7%
Cat. 5 or NC 754 45 100.0% 69.7% 21.3% 73.2%
NC 625 35 100.0% 34.4% 14.8% 34.4%
a.

Not included in the statistics (out of domain or indeterminate).

b.

An assessment of whether the (Q)SAR test is fit-for-purpose for classification and labeling, that is it predicts either the correct or a more potent/conservative category (or predicts one category more potent/conservative)

c.

An assessment of the accuracy of the (Q)SAR test, that is the proportion of correctly predicted or +/− one GHS category

The data collected reflects the typical distribution of GHS categories within corporate collections and as such it is highly imbalanced and weighted towards the less toxic compounds. Therefore, an overall balanced assessment of the 4 summary statistics was calculated alongside a baseline (represented by a random model). The balanced summary statistics were computed by averaging the values for each category, shown in Table 3, apart from the “Cat 5. or NC values”. This information was not used in this assessment since this category spans two experimental categories.

The supplemental material contains analogous information to Tables 2, 3, and 4 for the assessment of statistical-based and the expert rule-based methodologies (supplemental tables S1-S6) as well as the three industrial sectors analyzed: pharmaceutical, plant protection products and other chemicals (Supplemental tables S8-S18). As previously discussed for analysis of the consensus model on the combined dataset, due to the skewed nature of the datasets towards low toxicity chemicals, the balanced statistics presented therein provide valuable insight into the predictive performance of the different types of models on different kinds of chemicals. Table S7 summarizes the results for different (Q)SAR methodologies, statistical-based and expert rule-based, along with the consensus from the two methodologies. The same summary statistics were calculated over all the data (i.e., these values are not balanced). Table S19 summarizes the performance of the consensus model across the different sectors: pharmaceutical sector, plant protection products sector and other chemical sectors.

Table 4:

Balanced summary statistics result

Fit-for-purpose Accuracy
Averagea
percentage
correct or
more
conservative
Averagea
percentage
correct or one
more
conservative
Averagea
percentage
correct
Averagea
percentage
correct (+/− one
category)
Consensus model 80.4% 63.0% 42.1% 77.1%
Random model 54.0% 24.5% 13.1% 38.6%
a.

Averages across all experimental classes, excluding compounds in the “Cat. 5 or NC” class.

Supplemental tables S20, S21, and S22 show a series of experimental in vivo category 1 or 2 chemicals from the pharmaceutical industry, plant protection product industry and broader chemical industry that are predicted as a less conservative category. For example, a chemical whose experimental in vivo result is GHS category 1 yet the prediction is either category 2, 3, 4, 5 or NC. An assessment of other information that would be available for these chemicals is also provided, including other test results, information on chemical analogs as well as other information from within the deployed models. Based on this information a determination was made as to whether the chemical would have been correctly categorized based on an expert review of the totality of the information available. Using this information, Table 5 and 6 illustrate how a combination of using the (Q)SAR models in addition to an expert review would modify the prediction results for experimental in vivo GHS category 1 and 2 chemicals. Table 5 shows a table of counts for these modified results and Table 6 displays the performance metrics for these modified results. In both tables the original results (based on only the (Q)SAR models) are shown in parentheses.

Table 5:

Table showing the results of an expert review of the consensus prediction, with the original consensus prediction results shown in parentheses

Predicted
Experimental Cat. 1 Cat. 2 Cat. 3 Cat. 4 Cat. 5 NC Total
Cat. 1 7 (5) 1 (2) 0 (0) 0 (0) 0 (1) 0 (0) 8 (8)
Cat. 2 6 (5) 23 (18) 2 (5) 1 (2) 0 (2) 0 (1) 32 (33)a
a

there is one compound less in the total column for GHS Cat.2 (i.e., 32 (33)) since one result (ID 703) was assigned to inconclusive after an expert review

Table 6:

Performance metrics showing the results of an expert review of the consensus prediction, with the original consensus performance metrics without expert review shown in parentheses

Fit-for-purpose Accuracy
Experimental
value
Count Number of
inconclusive
results
Percentage
correct or
more
conservative
Percentage
correct or one
more
conservative
Percentage
correct
Percentage
correct (+/−
one category)
Cat. 1 8 0 87.5% (62.5%) 87.5% (62.5%) 87.5% (62.5%) 100% (87.5%)
Cat. 2 34 1 90.6% (69.7%) 90.6% (69.7%) 71.9% (54.6%) 96.9% (84.9%)

Discussions

Expert review

An expert review of the supporting information is considered best practice to improve the overall reliability of any prediction (Myatt et al, 2018). Such a review supports an assessment of the reliability of the information as well as potentially modifying the result with sufficient supportive evidence. In most situations (as shown in Tables 5 and 6), these predictions would have been corrected based on an expert review using the following information:

  • related in vitro assay results or information on the chemical’s MoA for therapeutic or pesticidal activity

  • other hazardous properties such as corrosivity

  • a search for chemical analogs (e.g., structural similarity, nearest neighbors)

  • chemical class considerations with known uncertainties (e.g., reactive fluorinated substances)

  • examination of the additional information from the deployed model results and the underlying data

  • potential downstream metabolism

These are items to consider as part of an expert review. In addition, an expert review of inconclusive results may provide additional supportive evidence to support an assignment to a GHS category.

A formalized procedure is being developed describing what specific in silico model results and/or other experimental data to consider as part of an acute toxicity hazard assessment. This includes recommendations for how such information should be reviewed and consolidated as part of a weight-of-evidence assessment, alongside guidelines for an expert review of this information. The protocol is being developed as part of the in silico toxicology protocol consortium (Myatt et al., 2018). This procedure will help ensure future predictions are performed in a consistent, documented and repeatable manner.

Performance of (Q)SAR models

The performance of the consensus model for different in vivo GHS categories was assessed (see Table 3). For in vivo GHS category 1 or 2 chemicals, the proportion of correct or a more conservative prediction was over 60%; however, when an expert review was taken into consideration this number increases to approximately 90% (see Table 6). For category 3, although 65.1% were predicted correct or more conservative, 96% were predicted to be in a correct or adjacent category (i.e., either category 2, 3, or 4). For all other categories, the percentage of correct or more conservative predictions was greater than 90%.

Experimental in vivo GHS 1 and 2 categories had a low number of compounds compared to the other classes which indicates that chemicals do not generally fit in the higher potency classes with most pharmaceutical, plant protection product, and other industrial chemicals typically falling in GHS category 3-5 or NC. Since there were fewer chemicals within the higher potency categories, a series of balanced summary statistics were computed to assess whether the consensus model was fit-for-purpose (i.e., predicting the in vivo GHS category or a more conservative category) as shown in Table 4.

Both statistical-based and expert rule-based methodologies were individually assessed and able to predict either the correct category or a more conservative category for over 90% of the chemicals (where a prediction was made) (see Table S7), with a balanced statistic of over 73% (see supplemental tables S3 and S6). A consensus prediction from both methodologies was also calculated and this prediction had the highest score for correct or more conservative. The statistical-based model was more accurate with approximately 80% of the chemicals being correctly predicted or predicted to be in an adjacent class (either higher or lower), with the same balanced statistic value of 80% (see supplemental table S3). Therefore, all three results could be used in different ways for classification and labelling. For example, the consensus prediction may be used, in a regulatory context, to assess what GHS category to use based on the model results (since this is most conservative); however, the statistical-based model may provide more weight to determine whether additional testing is warranted (since this is the more accurate model). Hence, these models could be utilized in different manners depending on the final intended use of the prediction: screening or classification. Although predicting a more conservative value is protective of public health, other considerations (such as the cost of supporting a category 1 assignment) may also influence whether additional testing is needed for those chemicals predicted in the most toxic categories. The summary statistics include an assessment of the prediction of the correct or one more conservative category to support these decisions.

The consensus model predictions were investigated across different industries, i.e., pharmaceutical, plant protection product and other industrial chemicals, including specialty chemicals (see Supplemental tables S8-S19). The proportion of correct or more conservative predictions across all three sources of data was greater than 93% (see supplemental Table S19) indicating a high reliability for the (Q)SAR models (with balance statistics greater than 75%, after excluding an unreliable statistic for pharmaceutical category 1 compounds based on only two datapoints - see supplemental tables S10, S15 and S18).

Several AOT in silico models have been assessed as part of a publication by Graham and co-authors (Graham et al., 2020). This paper illustrates the accuracy, reliability, and applicability of these models in the pharmaceutical chemical space. Graham et al. also elucidates how to utilize these models to fill in data gaps, inform decisions regarding Dangerous Goods classifications and to reduce animal use and reliance on animal test methods for acute oral toxicity GHS categorization.

Regulatory experience of using (Q)SARs

Other research and development as well as regulatory use cases have successfully incorporated (Q)SAR model results in place of in vivo and in vitro studies. For example, the ICH M7 regulatory guideline (ICH M7 2017) and the EFSA guidance (EFSA 2016) recommend, for certain kinds of chemical species, the of two complementary (Q)SAR models, one statistical-based and one expert rule-based. (Q)SAR model results alongside an expert review are accepted as part of regulatory submissions as per these guidelines. Where a mutagenic (Q)SAR prediction is made, it is possible to follow-up this finding with an Ames test and a negative result of this in vitro test would then override any positive (Q)SAR prediction. This mirrors the findings in this paper.

Regulatory acceptance has also provided impetus for the development of improved models for predicting bacterial mutation. Landry et al. (2019) shows how, based on a larger training set and improved knowledge of mutagenicity SAR, improvements to both the sensitivity and specificity of the models have been made. A series of papers have been written outlining best practices in the application of the models alongside guidelines for performing an expert review of the results (Powley 2015; Barber 2015; Amberg 2016; Amberg 2019). In addition, predictions within specific classes, such as nitrosamines (Bercu et al., 2020) and aromatic primary amines (Ahlberg et al., 2016) are still challenging and are the focus of active R&D developments. These classes also require expert review. This situation parallels the findings of this paper where specific classes, such as reactive fluorinated substances, were singled out for a more in-depth expert review in supplemental Table S22.

Use cases and workflows

A (Q)SAR assessment of AOT could be utilized to support both transportation and worker safety assessments as well as emergency over-dose situations (e.g. poison control) or health hazard assessments for large-scale spills of chemicals that lack AOT data. An example flow-chart for making an AOT GHS assessment based, in part, on (Q)SAR models is shown in Figure 5.

Figure 5:

Figure 5:

Establishing a GHS classification based on available data and (Q)SAR model results

The first step for any test chemical is to identify whether AOT data are available, either within proprietary databases or through a search of publicly available information. Such a search should return chemicals that match the chemical exactly (including different salt forms, tautomers, etc.). In many situations, the test chemical cannot be submitted to an online service because of intellectual property concerns and so issuing such a query behind a company’s firewall is often important. Ideally, such a search will return an adequate AOT study, including information on species tested, route of exposure, and LD50 value. Studies not considered adequate or performed via other routes of exposure may be considered as part of the weight of the evidence in the expert review, discussed later. Further assessment of any available studies should focus on whether they are reliable including consideration of whether the Klimisch score (Klimisch et al., 1997) is 1 or 2 (i.e., a well-documented and accepted/sufficient study or data from the literature that is performed according to or partially compliant with valid and/or accepted test guidelines, and preferably performed according to good laboratory practices). Regarding the species tested, the rat is the preferred test species (because of the similarity of the genome between rats and humans); however, if AOT data are available in other animal species, expert scientific judgment should be used to select the most appropriate LD50. Such LD50 data could be used directly to assign the chemical to a GHS category.

In situations where there are no AOT studies available, then it may be possible to use other repeated dose studies to derive an estimate for LD50 or to separate non-classified chemicals from those requiring follow-up (Bulgheroni et al 2009; Graepel et al 2016).

In the absence of reliable AOT experimental data or the ability to derive an LD50 value from other information, a (Q)SAR assessment provides an alternative approach to estimate the GHS category. In this paper, we observed that accepting the prediction with the most toxic outcome from two complementary (Q)SAR methodologies provided the most conservative overall results, which is desirable to protect health. Further, the importance of conducting an expert review for any assessment is recognized. Such a review may take into consideration other information on the chemical’s MoA, other hazardous properties, chemical analogs (i.e., read-across), inspection of the individual model’s results, and any mechanistic information including the potential of the chemical to metabolize. In addition, AOT data not deemed to be sufficiently reliable, on their own, may be included in a weight-of-evidence assessment. The expert review process should generate a documented assessment of the assigned GHS category. It may be possible to generate an assessment even if the (Q)SAR models are unable to predict a GHS category, such as when a chemical is out-of-domain (i.e., the prediction reliability is expected to be lower) or indeterminate (i.e., there is conflicting information) based on sufficient additional information.

Finally, there may be situations when the (Q)SAR results and expert review does not result in a GHS classification, primarily due to insufficient information. In this situation, another option for hazard identification should be considered.

In addition to their use in establishing a GHS classification, these types of models also have utility for other applications. For example, in early stage research and development (R&D) they could be used as a guide for relative acute toxicity risk and used to help design testing strategies as well as to inform compound design and selection. Different use cases for acute toxicity computational models are also outlined as part of the in silico protocol for acute toxicity.

Conclusions

As the current standard for acute oral toxicity hazard identification is a test conducted using animals, an AOT in silico model potentially offers a rapid and cost-effective alternative approach. In silico models have the potential to effectively reduce or eliminate the use of in vivo testing, thereby reducing the reliance of industry on these models for AOT hazard identification. Given that in silico models have been developed based on the wealth of publicly available AOT data, it is promising to note that the Leadscope AOT suite was capable of making typically reliable AOT hazard predictions for a broad range of chemical structures, spanning numerous industries. The evaluation presented in this manuscript also points out the importance of an expert review to enable a weight of evidence approach. Guidance is also provided on the use of such models to fulfill regulatory requirements, classification and labelling, and transportation needs. In addition, other uses for such models include prioritization and screening of chemicals in early R&D. It can be concluded that for predicting acute toxicity, the use of qualified and transparent (Q)SAR models, such as the Leadscope AOT suite, coupled with an expert review, provides a scientifically rational, reasonable and conservative approach to hazard identification.

Supplementary Material

1

Highlights.

  • Summarizes the use of an acute oral toxicity test for classification and labelling

  • Highlights how an alternative method would support the 3Rs

  • Presents the performance result for AOT in silico models against over 2,000 proprietary and marketed chemicals

  • Discusses factors to consider as part of an expert review

  • Outlines a workflow to incorporate in silico models to support GHS classification and labelling

Acknowledgements

Research reported in this publication was supported by the National Institute of Environmental Health Sciences of the National Institutes of Health under Award Number R44ES026909. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Pierre Ferrer was supported, in part, through the NIH training grant T32 ES026568.

Footnotes

3

Any example workflow or guidance outlined in this paper is not currently endorsed or approved by Syngenta

4

The term “(Q)SAR” is as an acronym for computational models that predict a biological response (such as acute toxicity) based on the chemical structure of the test molecule. It refers to both quantitative and non-quantitative structure-activity relationships by placing the “Q” in brackets.

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. 49 CFR, Part 178. Code of Federal Regulations. Title: PART 178 - SPECIFICATIONS FOR PACKAGINGS. https://www.govinfo.gov/content/pkg/CFR-2019-title49-vol3/xml/CFR-2019-title49-vol3-part178.xml
  2. 16 CFR 1500.3. Code of Federal Regulations. Federal Hazardous Substances Act (FHSA) Requirements. https://www.cpsc.gov/Business--Manufacturing/Business-Education/Business-Guidance/FHSA-Requirements
  3. Ahlberg E, Amberg A, Beilke LD, Bower D, Cross KP, Custer L, Ford KA, Gompel JV, Harvey J, Honma M, Jolly R, Joossens E, Kemper RA, Kenyon M, Kruhlak N, Kuhnke L, Leavitt P, Naven R, Neilan C, Quigley DP, Shuey D, Spirkl H-P, Stavitskaya L, Teasdale A, White A, Wichard J, Zwickl C, Myatt GJ, 2016. Extending (Q)SARs to incorporate proprietary knowledge for regulatory purposes: A case study using aromatic amine mutagenicity. Regulatory Toxicology and Pharmacology 77, 1–12. doi: 10.1016/j.yrtph.2016.02.003 [DOI] [PubMed] [Google Scholar]
  4. Allen CHG, Mervin LH, Mahmoud SY, Bender A 2019. Leveraging heterogeneous data from GHS toxicity annotations, molecular and protein target descriptors and Tox21 assay readouts to predict and rationalise acute toxicity. J Cheminform 11, 36. 10.1186/s13321-019-0356-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Amberg A, Beilke L, Bercu J, Bower D, Brigo A, Cross KP, Custer L, Dobo K, Dowdy E, Ford KA, Glowienke S, Gompel JV, Harvey J, Hasselgren C, Honma M, Jolly R, Kemper R, Kenyon M, Kruhlak N, Leavitt P, Miller S, Muster W, Nicolette J, Plaper A, Powley M, Quigley DP, Reddy MV, Spirkl H-P, Stavitskaya L, Teasdale A, Weiner S, Welch DS, White A, Wichard J, Myatt GJ, 2016. Principles and procedures for implementation of ICH M7 recommended (Q)SAR analyses. Regulatory Toxicology and Pharmacology 77, 13–24. doi: 10.1016/j.yrtph.2016.02.004 [DOI] [PubMed] [Google Scholar]
  6. Amberg A, Andaya RV, Anger LT, Barber C, Beilke L, Bercu J, Bower D, Brigo A, Cammerer Z, Cross KP, Custer L, Dobo K, Gerets H, Gervais V, Glowienke S, Gomez S, Gompel JV, Harvey J, Hasselgren C, Honma M, Johnson C, Jolly R, Kemper R, Kenyon M, Kruhlak N, Leavitt P, Miller S, Muster W, Naven R, Nicolette J, Parenty A, Powley M, Quigley DP, Reddy MV, Sasaki JC, Stavitskaya L, Teasdale A, Trejo-Martin A, Weiner S, Welch DS, White A, Wichard J, Woolley D, Myatt GJ, 2019. Principles and procedures for handling out-of-domain and indeterminate results as part of ICH M7 recommended (Q)SAR analyses. Regulatory Toxicology and Pharmacology 102, 53–64. doi: 10.1016/j.yrtph.2018.12.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. AnimalResearch.Info 2018: http://www.animalresearch.info/en/designing-research/alternatives-and-3rs/
  8. Barber C, Amberg A, Custer L, Dobo KL, Glowienke S, Gompel JV, Gutsell S, Harvey J, Honma M, Kenyon MO, Kruhlak N, Muster W, Stavitskaya L, Teasdale A, Vessey J, Wichard J, 2015. Establishing best practise in the application of expert review of mutagenicity under ICH M7. Regulatory Toxicology and Pharmacology 73, 367–377. doi: 10.1016/j.yrtph.2015.07.018 [DOI] [PubMed] [Google Scholar]
  9. Bercu et al. , Compound- and Class-Specific Limits for Common Impurities in Pharmaceuticals, currently being prepared [Google Scholar]
  10. 49 CFR, Part 178. Volume: 3Date: 2019-10-01Original Date: 2019-10-01Title: PART 178 - SPECIFICATIONS FOR PACKAGINGS Context: Title 49 - Transportation. Subtitle B - Other Regulations Relating to Transportation (Continued). CHAPTER I - PIPELINE AND HAZARDOUS MATERIALS SAFETY ADMINISTRATION, DEPARTMENT OF TRANSPORTATION (CONTINUED). SUBCHAPTER C - HAZARDOUS MATERIALS REGULATIONS (CONTINUED).
  11. Creton S, Dewhurst IC, Earl LK, Gehen SC, Guest RL, Hotchkiss JA, Indans I, Woolhiser MR, Billington R, 2009. Acute toxicity testing of chemicals—Opportunities to avoid redundant testing and use alternative approaches. Critical Reviews in Toxicology 40, 50–83. doi: 10.3109/10408440903401511 [DOI] [PubMed] [Google Scholar]
  12. Drwal MN, Banerjee P, Dunkel M, Wettig MR, Preissner R, 2014. ProTox: a web server for the in silico prediction of rodent oral toxicity. Nucleic Acids Research 42. doi: 10.1093/nar/gku401 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. ECHA 2008. Guidance on information requirements and chemical safety assessment Chapter R.6: QSARs and grouping of chemicals. https://echa.europa.eu/documents/10162/13632/information_requirements_r6_en.pdf/77f49f81-b76d-40ab-8513-4f3a533b6ac9
  14. ECHA 2015. Guidance on the Application of the CLP Criteria Guidance to Regulation (EC) No 1272/2008 on classification, labelling and packaging (CLP) of substances and mixtures. https://echa.europa.eu/documents/10162/23036412/clp_en.pdf/58b5dc6d-ac2a-4910-9702-e9e1f5051cc5
  15. EFPIA 2019. Putting Animal Welfare Principles and 3Rs into Action - European Pharmaceutical Industry Report. 2019 Update. European Federation of Pharmaceutical Industries and Associations (EFPIA). [Google Scholar]
  16. EFSA 2016. Guidance on the establishment of the residue definition for dietary risk assessment. 10.2903/j.efsa.2016.4549 [DOI]
  17. EU Directive 2010/63/EU. https://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2010:276:0033:0079:EN:PDF
  18. EU 2006. Regulation (EC) No 1907/2006 of the European Parliament and of the council of 18 December 2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH). http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:02006R1907-20161011&from=EN
  19. EU 2008. Regulation (EC) No 1272/2008 of the European Parliament and of the Council of 16 December 2008 on classification, labelling and packaging of substances and mixtures, amending and repealing Directives 67/548/EEC and 1999/45/EC, and amending Regulation (EC) No 1907/2006 https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32008R1272
  20. EU 2017. Guidance on the Application of the CLP Criteria Guidance to Regulation (EC) No 1272/2008 on classification, labelling and packaging (CLP) of substances and mixtures Version 5.0 July 2017. https://echa.europa.eu/documents/10162/23036412/clp_en.pdf/58b5dc6d-ac2a-4910-9702-e9e1f5051cc5
  21. Erhirhie EO, Ihekwereme CP, Ilodigwe EE, 2018. Advances in acute toxicity testing: strengths, weaknesses and regulatory acceptance. Interdisciplinary Toxicology 11, 5–12. doi: 10.2478/intox-2018-0001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. FIFRA 1996. Federal Insecticide, Fungicide, and Rodenticide Act. https://www.epa.gov/laws-regulations/summary-federal-insecticide-fungicide-and-rodenticide-act
  23. GHS additivity formula. A true replacement method for acute systemic toxicity testing of agrochemical formulations (Corvaro 2016) [DOI] [PubMed] [Google Scholar]
  24. Graham J, Rodas M, Hillegass J, Schulze G, 2020. The performance, reliability and potential application of in silico models for predicting the acute oral toxicity of pharmaceutical compounds." Regulatory Toxicology and Pharmacology (2020): 104816. [DOI] [PubMed] [Google Scholar]
  25. Hamm J, Sullivan K, Clippinger AJ, Strickland J, Bell S, Bhhatarai B, Blaauboer B, Casey W, Dorman D, Forsby A, Garcia-Reyero N, Gehen S, Graepel R, Hotchkiss J, Lowit A, Matheson J, Reaves E, Scarano L, Sprankle C, Tunkel J, Wilson D, Xia M, Zhu H, Allen D, 2017. Alternative approaches for identifying acute systemic toxicity: Moving from research to regulatory testing. Toxicology in Vitro 41, 245–259. doi: 10.1016/j.tiv.2017.01.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. IATA 202. https://www.iata.org/
  27. ICH M7, 2017. (R1). Assessment and control of DNA reactive (mutagenic) impurities in pharmaceuticals to limit potential carcinogenic risk http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Multidisciplinary/M7/M7_R1_Addendum_Step_4_31Mar2017.pdf.
  28. Karmaus AL (National Toxicology Program). Rat Oral Acute Toxicity Database and Evaluation of Variability. Predictive Models for Acute Oral System Toxicity Workshop. April 11, 2018. presentation. [Google Scholar]
  29. Klimisch HJ, Andreae M, Tillmann U, 1997. A systematic approach for evaluating the quality of experimental toxicological and ecotoxicological data. Regul. Toxicol. Pharmacol 25, 1–5. 10.1006/rtph.1996.1076. [DOI] [PubMed] [Google Scholar]
  30. Kleinstreuer NC, Karmaus AL, Mansouri K, Allen DG, Fitzpatrick JM, Patlewicz G, 2018. Predictive models for acute oral systemic toxicity: A workshop to bridge the gap from research to regulation. Computational Toxicology 8, 21–24. doi: 10.1016/j.comtox.2018.08.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Landry C, Kim MT, Kruhlak NL, Cross KP, Saiakhov R, Chakravarti S, Stavitskaya L, 2019. Transitioning to composite bacterial mutagenicity models in ICH M7 (Q)SAR analyses. Regulatory Toxicology and Pharmacology 109, 104488. doi: 10.1016/j.yrtph.2019.104488 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Lapenna S, Fuart-Gatnik M, Worth A. Review of QSAR Models and Software Tools for Predicting Acute and Chronic Systemic Toxicity. Luxembourg: Office of the European Union; 2010. (JRC Technical Report EUR 24639 EN). [Google Scholar]
  33. Lautenberg Chemical Safety Act (2016). https://www.epa.gov/assessing-and-managing-chemicals-under-tsca/frank-r-lautenberg-chemical-safety-21st-century-act
  34. Leadscope 2020. https://www.leadscope.com/index.php
  35. Lewis KA, Tzilivakis J, Warner D and Green A (2016). An international database for pesticide risk assessments and management. Human and Ecological Risk Assessment: An International Journal, 22(4), 1050–1064. doi: 10.1080/10807039.2015.1133242 [DOI] [Google Scholar]
  36. Myatt G, Beilke L, Cross K, 2017. In Silico Tools and their Application. Comprehensive Medicinal Chemistry III 156–176. doi: 10.1016/b978-0-12-409547-2.12379-0 [DOI] [Google Scholar]
  37. Myatt GJ, Ahlberg E, Akahori Y, Allen D, Amberg A, Anger LT, Aptula A, Auerbach S, Beilke L, Bellion P, Benigni R, Bercu J, Booth ED, Bower D, Brigo A, Burden N, Cammerer Z, Cronin MT, Cross KP, Custer L, Dettwiler M, Dobo K, Ford KA, Fortin MC, Gad-Mcdonald SE, Gellatly N, Gervais V, Glover KP, Glowienke S, Gompel JV, Gutsell S, Hardy B, Harvey JS, Hillegass J, Honma M, Hsieh J-H, Hsu C-W, Hughes K, Johnson C, Jolly R, Jones D, Kemper R, Kenyon MO, Kim MT, Kruhlak NL, Kulkarni SA, Kümmerer K, Leavitt P, Majer B, Masten S, Miller S, Moser J, Mumtaz M, Muster W, Neilson L, Oprea TI, Patlewicz G, Paulino A, Piparo EL, Powley M, Quigley DP, Reddy MV, Richarz A-N, Ruiz P, Schilter B, Serafimova R, Simpson W, Stavitskaya L, Stidl R, Suarez-Rodriguez D, Szabo DT, Teasdale A, Trejo-Martin A, Valentin J-P, Vuorinen A, Wall BA, Watts P, White AT, Wichard J, Witt KL, Woolley A, Woolley D, Zwickl C, Hasselgren C, 2018. In silico toxicology protocols. Regulatory Toxicology and Pharmacology 96, 1–17. doi: 10.1016/j.yrtph.2018.04.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Myatt GJ, Bower D, Cross K, Johnson C, Quigley DQ, Tice R, Zwickl C, In silico acute toxicity protocols and models (Poster #747), 55th Congress of the European Societies of Toxicology, Helsinki, Finland 8-11 September 2019 [Google Scholar]
  39. NASEM 2015. Application of modern toxicology approaches for predicting acute toxicity for chemical defense, 2015. National Academies Press, Washington, D.C. https://www.ncbi.nlm.nih.goV/books/NBK321419/#sec_000055 [PubMed] [Google Scholar]
  40. NC3Rs (2020). National Centre for the Replacement Refinement & Reduction of Animals in Research (NC3Rs): https://www.nc3rs.org.uk/3rs-toxicology-and-regulatory-sciences [PMC free article] [PubMed]
  41. OECD (2008), Test No. 425: Acute Oral Toxicity: Up-and-Down Procedure, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, 10.1787/9789264071049-en [DOI] [Google Scholar]
  42. OECD (2002a), Test No. 420: Acute Oral Toxicity - Fixed Dose Procedure, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, 10.1787/9789264070943-en. [DOI] [Google Scholar]
  43. OECD (2002b), Test No. 423: Acute Oral toxicity - Acute Toxic Class Method, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, 10.1787/9789264071001-en. [DOI] [Google Scholar]
  44. (OECD 2010). Guidance document on using cytotoxicity tests to estimate starting doses for acute oral systemic toxicity tests. 20-July-2010. Series on Testing and Assessment No. 129 http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=env/jm/mono(2010)20&doclanguage=en [Google Scholar]
  45. OECD 2001. OECD series on testing and assessment number 24 guidance document on acute oral toxicity testing https://ntp.niehs.nih.gov/iccvam/suppdocs/feddocs/oecd/oecd-gd24.pdf [Google Scholar]
  46. Pham LL, Watford SM, Pradeep P, Martin M, Thomas R, Judson RS, Setzer RW, Friedman KP, 2020. Variability in in vivo studies: Defining the upper limit of performance for predictions of systemic effect levels, Computational Toxicology 15, 100126. doi: 10.1016/j.comtox.2020.100126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Powley MW, 2015. (Q)SAR assessments of potentially mutagenic impurities: A regulatory perspective on the utility of expert knowledge and data submission. Regulatory Toxicology and Pharmacology 71, 295–300. doi: 10.1016/j.yrtph.2014.12.012 [DOI] [PubMed] [Google Scholar]
  48. RTECS 2011. http://www.cdc.gov/niosh/rtecs/default.html
  49. Russell WMS and Burch RL, 1959. The principles of humane experimental technique. Universities Federation for Animal Welfare, Wheathampstead. [Google Scholar]
  50. Schrage A, Hempel K, Schulz M, Kolle SN, van Ravenzwaay B, Landsiedel R Refinement and Reduction of Acute Oral Toxicity Testing: A Critical Review of the Use of Cytotoxicity Data. Altern Lab Anim. 2011. July;39(3):273–95. doi: 10.1177/026119291103900311. [DOI] [PubMed] [Google Scholar]
  51. Strickland J, Clippinger AJ, Brown J, Allen D, Jacobs A, Matheson J, Lowit A, Reinke EN, Johnson MS, Quinn MJ, Mattie D, Fitzpatrick SC, Ahir S, Kleinstreuer N, Casey W, 2018. Status of acute systemic toxicity testing requirements and data uses by U.S. regulatory agencies. Regulatory Toxicology and Pharmacology 94, 183–196. doi: 10.1016/j.yrtph.2018.01.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Tox21 (2008). https://www.epa.gov/chemical-research/toxicology-testing-21st-century-tox21
  53. TSCA 2016. Toxic Substances Control Act (TSCA). https://www.congress.gov/bill/114th-congress/senate-bill/697/all-info
  54. Yang C, Cross K, Myatt GJ, Paul E Blower PE, Rathman JF Building Predictive Models for Protein Tyrosine Phosphatase 1B Inhibitors Based on Discriminating Structural Features by Reassembling Medicinal Chemistry Building Blocks. J. Med. Chem 2004, 47, 5984–5994 [DOI] [PubMed] [Google Scholar]
  55. UN GHS 2005. Globally Harmonized System of Classification and Labelling of Chemicals (GHS) (“The Purple Book”), United Nations, 2005 First Revised Edition, available at www.unece.org/trans/danger/publi/ghs/ghs_rev01/01files_e.html or from United Nations Publications; (publications@un.org) [Google Scholar]
  56. UN 2019a. United Nations Globally Harmonized System of Classification and Labelling of Chemicals Eighth Revised Edition (2019) [Google Scholar]
  57. UN 2019b. United Nations Recommendations on the Transport of Dangerous Goods 21st Revised Edition (2019) [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES