Skip to main content
EPA Author Manuscripts logoLink to EPA Author Manuscripts
. Author manuscript; available in PMC: 2022 May 1.
Published in final edited form as: Comput Toxicol. 2021 May 1;18:10.1016/j.comtox.2021.100167. doi: 10.1016/j.comtox.2021.100167

Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset

Prachi Pradeep a,b, Richard Judson b, David M DeMarini b, Nagalakshmi Keshava c, Todd M Martin c, Jeffry Dean d, Catherine F Gibbons e, Anita Simha f, Sarah H Warren b, Maureen R Gwinn b, Grace Patlewicz b,*
PMCID: PMC8422876  NIHMSID: NIHMS1719707  PMID: 34504984

Abstract

Regulatory agencies world-wide face the challenge of performing risk-based prioritization of thousands of substances in commerce. In this study, a major effort was undertaken to compile a large genotoxicity dataset (54,805 records for 9299 substances) from several public sources (e.g., TOXNET, COSMOS, eChemPortal). The names and outcomes of the different assays were harmonized, and assays were annotated by type: gene mutation in Salmonella bacteria (Ames assay) and chromosome mutation (clastogenicity) in vitro or in vivo (chromosome aberration, micronucleus, and mouse lymphoma Tk+/− assays). This dataset was then evaluated to assess genotoxic potential using a categorization scheme, whereby a substance was considered genotoxic if it was positive in at least one Ames or clastogen study. The categorization dataset comprised 8442 chemicals, of which 2728 chemicals were genotoxic, 5585 were not and 129 were inconclusive. QSAR models (TEST and VEGA) and the OECD Toolbox structural alerts/profilers (e.g., OASIS DNA alerts for Ames and chromosomal aberrations) were used to make in silico predictions of genotoxicity potential. The performance of the individual QSAR tools and structural alerts resulted in balanced accuracies of 57–73%. A Naïve Bayes consensus model was developed using combinations of QSAR models and structural alert predictions. The ‘best’ consensus model selected had a balanced accuracy of 81.2%, a sensitivity of 87.24% and a specificity of 75.20%. This in silico scheme offers promise as a first step in ranking thousands of substances as part of a prioritization approach for genotoxicity.

Keywords: genotoxicity, Ames, clastogenicity, QSAR, structural alert, risk-based prioritization, TSCA

1. Introduction

The Lautenberg Chemical Safety Act for the 21st Century [1], which amends the Toxic Substances Control Act (TSCA) (1976), requires that the US Environmental Protection Agency (US EPA) make risk-based prioritization for the tens of thousands of substances in commerce. The TSCA Inventory contains 86,405 chemicals of which 41,484 are active in US commerce. (https://www.epa.gov/chemicals-under-tsca/now-available-latest-update-tsca-inventory). For high-priority substances, US EPA will then develop risk evaluations that integrate toxicity data with exposure information derived from intended conditions of use. Risk-based prioritization forms an important component of regulatory frameworks worldwide. In recent years, there has been a concerted effort to consider alternatives such as high throughput screening (HTS) data, the use of Thresholds of Toxicological Concern (TTC), (Quantitative) Structure activity relationships and read-across in lieu of traditional toxicity data to provide the information needed to facilitate risk-based prioritization [2,3].

Genotoxicity assessment is an important component of understanding the hazards associated with new and existing chemicals and is included in the analysis of chemicals under a variety of national and international mandates [47]. Genotoxicity refers to the ability of chemicals to induce DNA damage, such as DNA strand breaks or DNA adducts and/or mutations, such as somatic and heritable changes in DNA sequence. Many studies have assessed the combinations of in vitro, in vivo, and in silico genotoxicity tests that are most able to capture the range of known genotoxicants or rodent carcinogens, resulting in a testing scheme recommended by the Organization for Economic Cooperation and Development (OECD) Genetic Toxicology Test Guidelines [8,9], the International Conference on Harmonization (ICH) [10], and the National Toxicology Program (NTP) [11].

Regulatory bodies in the US, such as the EPA and Food and Drug Administration (FDA), recommend a number of OECD genetic toxicology guidelines [9]. These guidelines include a set of bacterial assays to detect gene mutations using various Salmonella strains and Escherichia coli WP2 (TG471), and assays to detect chromosomal mutation (also called clastogenicity), including the in vitro chromosome aberration (CA) (TG473), in vitro micronucleus (MN) (TG487), in vivo mouse bone-marrow or erythrocyte MN (TG 475, TG474), and the in vitro mouse lymphoma Tk+/− assays (TG476) [1215] The rationale for utilizing a combination of assays is to detect all types of genotoxic agents, whether they induce primarily gene mutations, chromosomal mutations, or aneuploidy.

Recently, Williams et al. [16] assembled a proprietary database of >10,000 compounds drawn from the Leadscope and Lhasa Limited databases and performed an analysis to determine if some of the eight OECD TG471-recommended bacterial mutagenicity assays were redundant [16]. The authors found that the chemicals that were mutagenic in Salmonella strains TA98 and/or TA100 accounted for 93% of the mutagens identified using up to eight of the bacterial strains of Salmonella and E. coli recommended by TG471. The addition of those chemicals that were negative in TA98 and TA100 but positive in at least one clastogenesis assay accounted for 99% of the mutagens identified using the entire set of TG471-recommended bacterial strains [5]. Thus, there was considerable redundancy among the TG471-recommended bacterial strains, and the results of Williams et al. [16] supported the evaluation of a chemical as mutagenic if it was positive in Salmonella TA98 and/or TA100 and/or one clastogenesis assays. Their findings proposed that “Salmonella strains TA1535, TA1537, TA102, and E. coli strain WP2 uvrA could be removed from those recommended in OECD TG471 with little, if any, loss of sensitivity for the detection of bacterial mutagens.” [16].

In the absence of experimental data, genotoxicity may be predicted using in silico (quantitative) structure activity relationship (QSAR) models for Ames mutagenicity and clastogenicity. Several reviews describe the state-of-art of in silico genotoxicity models that have been published to date [1722]. It is worth noting that the integration of in silico models or other information has been discussed elsewhere [23, 24].

The rationale underlying the current study was to develop and evaluate a scheme for prioritizing thousands of chemicals based on their genotoxicity potential. Consistent with the work by Williams et al. (2019) [16], a chemical was categorized as having potential genotoxicity if any single Ames or Clastogen study was positive. Thus, the overall scheme would be conservative and therefore protective to human health. For example, a chemical with 10 Ames results where all but one were negative would still be categorized overall as ‘Ames positive’. Throughout the manuscript this scheme is referred to as ‘the categorization scheme’.

For this analysis, a new non-proprietary experimental dataset extracted from various public sources comprising 9299 unique substances was assembled. The categorization scheme was applied to this dataset to establish a baseline in performance. Next, the predictive ability of two publicly available QSAR software tools and 6 structural alert schemes as implemented in the OECD Toolbox (described further in the Methods) was evaluated against this genotoxicity categorization scheme. The QSAR models and structural alerts were then employed in a Naïve Bayes algorithm to develop an ensemble model for genotoxicity prediction. The expectation was that such an ensemble model would leverage the relative strengths of the individual models to produce a more robust model to assess genotoxicity potential. Figure 1 outlines the overall workflow used in this study.

Figure 1.

Figure 1.

Outline of the workflow process adopted for this analysis.

The genotoxicity dataset compiled permitted other research questions to be asked, in particular how often the outcomes from a single Ames and clastogen study would result in a health-conservative call, particularly since these 2 assays would be the most likely to be conducted for new substances. As described in Williams et al. [16], if a single Ames assay was positive, a chemical was categorized as genotoxic, even if there were one or more negative Ames assay calls. Further, if all Ames assays for the chemical were negative, but at least one clastogen assay was positive, the chemical would be still categorized as genotoxic regardless of the presence of one or more negative clastogen assays.

2. Methods

2.1. Dataset

Chemicals were evaluated for their genotoxic potential based on results in the Salmonella (Ames) bacterial mutagenicity assay [25] and any of the three chromosome mutation assays as described in OECD guidelines [2630]: in vivo/in vitro micronucleus (MN) or chromosome aberration (CA) assays and the mouse lymphoma Tk+/− assay. The dataset developed was compiled from several sources. The five main sources are detailed below:

  1. COSMOS [http://www.cosmostox.eu/what/COSMOSdb/] This is a collection of experimental data on chemical hazard primarily for cosmetics ingredients. The dataset contained 4403 records from COSMOS [re-downloaded 2021-02-03].

  2. TOXNET (CCRIS and GENE-TOX) are no-longer actively supported databases of genotoxicity data from the National Library of Medicine (NLM) but can still be indirectly accessed through PubChem [https://www.nlm.nih.gov/toxnet/index.html]. The dataset contained 5986 records. [re-downloaded 2021-02-03].

  3. eChemPortal [https://www.echemportal.org/echemportal/] is a portal managed by OECD that contains information from many different Member Country Agencies. It also includes information extracted from EU REACH dossiers. Most of the data used in this study, 38,453 records, were taken from eChemPortal [re-downloaded 2021-02-03].

  4. National Toxicology Program (NTP) [https://manticore.niehs.nih.gov/datasets/search/trf] houses a bioassay genetox conclusion dataset which provides treatment-related findings including bioassay genetox conclusions from chronic bioassay level of evidence, bacterial mutagenicity, micronucleus, Tox21 and comet assay. The dataset contained 2904 records [Downloaded 2021-02-03].

  5. EURL ECVAM Genotoxicity and Carcinogenicity Consolidated Database of Ames Positive Chemicals [https://ec.europa.eu/jrc/en/scientific-tool/eurl-ecvam-genotoxicity-and-carcinogenicity-consolidated-database-ames-positive-chemicals] is a structured database that compiled available genotoxicity and carcinogenicity data for Ames positive chemicals originating from a number of different sources. The dataset contained 3059 records. [Downloaded 2021-02-03].

Table 1 provides a list of the number of substances that overlap between per data source on the basis of their substance identifier, DTXSID. The DTXSID, is the DSSTox Substance identifier used in the EPA CompTox Chemicals Dashboard (comptox.epa.gov/dashboard) [31].

Table 1.

Overlap between data sources on the basis of substance identifier (DTXSID)

Data source eChemPortal TOXNET COSMOS ECVAM NTP
eChemPortal 38453 4132 2293 1830 7614
TOXNET 1338 5986 259 2421 2619
COSMOS 3346 2471 4403 589 2583
ECVAM 1041 1993 94 3059 1957
NTP 1259 811 225 477 2904

There were 54805 records across the 5 main data sources for 9299 unique substances. A major effort was undertaken to standardize the naming of assay types and assay calls. This involved resolving spelling errors and aggregating assays of the same type. For example, reports annotated as Ames, Ames study, or Ames test(s) were re-labelled as a ‘bacterial reverse-mutation test’. Several mapping dictionaries were created to standardize terminology and ensure assays were not misclassified. Typical errors found included bacteria being flagged as a mammalian species in assays tagged as mammalian cell assays or chromosomal aberration assays in plants (onion, barley) being categorized the same as bone marrow micronucleus tests. The assays were then aggregated into four simple general categories to classify them as Ames, clastogen, gene mutation or other. For example, a mouse lymphoma Tk+/− assay was categorized as ‘clastogen’, whereas a DNA damage test, sister chromatid exchange test or a SOS chromotest would be categorized as ‘other’.

Only assays tagged as Ames or clastogen were carried forward into the remainder of the analysis. There was a total of 39507 records for aggregated assay types; Ames and clastogen for 8842 unique substances. The full dataset which includes the assay category mapping are provided on the EPA FTP site (ftp://newftp.epa.gov/Computational_Toxicology_Data/CCTE_Publication_Data/CCED_Publication_Data/PatlewiczGrace/CompTox-genetox/)

Given the different datasets used in this analysis, a workflow shown in Figure 2 is presented to clarify their construction.

Figure 2.

Figure 2.

Explanation of the construction of the genotoxicity dataset and how the different subsets are used in the subsequent analysis

2.2. Chemical Structures

All 9299 unique substances were submitted as a batch query to the EPA CompTox Chemicals Dashboard [32,33] in order to retrieve QSAR-Ready SMILES. QSAR-Ready SMILES were available for 7511 substances. Chemical substances in the DSSTox database have been curated and standardized to ensure correctness in chemical structure as well as their associations to chemical names and other identifiers such as CAS registry numbers. Examples of this curation include checking for errors and mismatches in chemical structure formats and mapping to identifiers, as well as structure validation issues like hyper-valency, etc. [3132, 34].

2.3. Genotoxicity categorization

The categorization scheme as outlined in Figure 3 was applied to the entire dataset compiled. The workflow approach comprised the following steps: 1) studies were grouped by chemical. If a positive Ames or clastogen study was found, then the chemical was categorized as ‘genotoxic’. If no positive results were found, but a chemical was found to be evaluated as negative in experimental studies, it was categorized as non-genotoxic. Chemicals with reported inconclusive experimental data were categorized accordingly. Of the starting set of 9299 substances, categorizations could be made for 8442 which had associated Ames or clastogenicity data. For these 8442 substances, 2055 chemicals were categorized as Ames positive, 673 as clastogen, 5328 as Ames negative, 257 as not clastogenic, and 129 as inconclusive.

Figure 3.

Figure 3.

Workflow for classifying chemicals as genotoxic, non-genotoxic, or inconclusive. Chemicals labeled as Mutagen or Clastogen are considered genotoxic. Workflow based on categorizing a chemical as genotoxic if it is positive in at least one assay.

2.4. Genotoxicity predictions from QSAR tools and alerts

Although there are a plethora of public and commercial models predicting Ames mutagenicity as noted earlier, the availability of QSAR models for predicting clastogenic endpoints are much more limited. Accordingly, publicly available structural alert schemes within the OECD Toolbox and 3 QSAR models within the VEGA platform [36,37] were used to provide an indication of clastogenic potential. Ames mutagenicity predictions were generated using two tools, EPA TEST (Toxicity Estimation Software Tool) [35], VEGA [36,37] and six different structural alert schemes within the OECD QSAR Toolbox version 4.4.1 were used to profile the substances [38,39].

The EPA TEST tool contains an Ames mutagenicity module that includes predictive models using a hierarchical-clustering approach, a nearest-neighbor approach, and a FDA approach on a training dataset of 6512 chemicals [35, 40]. The FDA method relies on making predictions of each chemicals using a unique cluster that is constructed at runtime which contains structurally similar chemicals selected from the overall training set. The batch prediction functionality available in TEST version 5.1 on the command line was used to calculate the consensus prediction of all the above approaches as implemented within the tool. VEGA version 1.15.47 [36] contained a consensus Ames model, as well as 4 other Ames models in addition to 2 micronucleus models and 1 chromosomal aberration model which were run in batch mode.

The OECD QSAR Toolbox [38,39] is a software application that facilitates data-gap-filling techniques such as read-across as needed for assessing the hazards of chemicals. The Toolbox contains several profilers (structural alert schemes) for chemical grouping and categorization. Six profilers were used that characterized either gene mutation or clastogenicity: DNA alerts for AMES, CA and MNT by OASIS, DNA binding by OASIS, DNA binding by OECD, protein binding alerts for CA by OASIS, in vitro mutagenicity (Ames test) alerts by the Instituto Superiore di Sanita (ISS), and in vivo mutagenicity (MNT) alerts by ISS. DNA OASIS profilers include alerts derived from the training sets used for the Tissue Metabolism Simulator (TIMES) expert system [41,42], whereas the ISS alerts rely on the ISSCAN database [43]. Each of these five alerts was treated as individual tools/models for mutagenicity prediction, and the presence of an alert in a chemical was considered as a positive outcome for genotoxicity potential. The models and alerts are collectively referred to as tools hereafter; and a summary is provided in Table 2.

Table 2.

QSAR tools and Structural alerts evaluated in this study

In Silico Tools Tool Type Label Details
Toxicity Estimation Software Tool (TEST) [28] QSAR T1 Ames consensus model
VEGA [36] QSAR T2-T9 T2: Consensus Ames model
T3: Caesar Ames model
T4: SarPy/IRFMN Ames model
T5: ISS Ames model
T6: KNN/Read-Across Ames model
T7: CORAL Chromosomal aberration model
T8: IRFMN/VERMEER in vitro micronucleus model
T9: IRFMN in vivo micronucleus model
OECD Toolbox [31] Alerts A1-A6 A1: DNA alerts for AMES, CA and MNT by OASIS
A2: DNA binding by OASIS
A3: DNA binding by OECD
A4: Protein binding alerts for Chromosomal aberration by OASIS
A5: in vitro mutagenicity (Ames test) alerts by ISS
A6: in vivo mutagenicity (Micronucleus) alerts by ISS

Although TEST and VEGA provide an indication of the underlying training set, this information was not readily available with the structural alert schemes used. Hence, the chemicals were not verified to be present in the training dataset for any of the QSAR tools and structural alerts in order to keep the analysis unbiased. Subsequently, the tools were used to predict the genotoxicity call for all the chemicals in the dataset used in this study. Predictive performance of the tools was evaluated using accuracy, sensitivity, specificity, balanced accuracy, positive predictivity, negative predictivity and the inter-rater agreement Kappa coefficient [44]. A correlation matrix was constructed to perform pairwise comparisons of the tool performances relative to each other to help identify any redundancies in the tools themselves as well as to select which to carry forward into subsequent ensemble modeling.

2.5. Naïve Bayes ensemble model for predicting genotoxicity

Ensemble modeling is a class of machine learning problem where a set of base classifiers (models) are combined using an aggregating function to enable a final prediction [45]. In this instance, QSAR models, EPA TEST and VEGA, and the structural alerts from the Toolbox are tagged as the base classifiers/models/descriptors, whereas the Naïve Bayes algorithm is used as the aggregating function (as adapted from Pradeep et al. [46]).

Predictions from any given combination of models may be aggregated in two main ways:

  1. Using a combination approach, termed Combination Scheme 1 (C1) where all tools are considered equivalent. For example, consider combinations 2 and 3 in Table 3. In combination number 2, A4 has a prediction of 1 and in combination number 3, A5 has a prediction of 1. In this case both these combinations are considered equivalent, since it does not matter which tool has a prediction of 1. Mathematically, the final combination can be calculated as a sum of the binary predictions as follows:
    Combinationscheme1=T1+T5+T8+A3 Eq. 1
  2. Using a permutation approach, (termed Combination scheme 2, (C2)) where the tools are non-equivalent, and the specific combination of tools that have a prediction of 1 is important. For example, in the previous example of combination numbers 2 and 3 from Table 3, a prediction of 1 from tool A4 versus A5 is considered different, thus it matters which tool has a prediction of 1. Mathematically, the final combination number can be calculated using a binary to decimal conversion combination scheme as follows:
    Combinationscheme2=23×T1+22×T5+21×T8+20×A3 Eq. 2

Table 3.

Development of combination schemes 1 and 2 for use in the ensemble model. In this illustration, 4 tools (N=4) are used to develop the 2 combinations. C1 and C2 refer to combination scheme 1 and combination scheme 2, respectively.

Combination
Number
T1 T5 T8 A3  C1  C2
1 0 0 0 0 0 0
2 0 0 0 0 1 1
3 0 0 0 0 1 2
4 0 0 0 0 2 3
. .. .. .. .. .. ..
15 1 1 1 0 3 14
16 1 1 1 1 4 15

Once the combinations were ascertained, the next step is to determine the posterior probability of a chemical being genotoxic or non-genotoxic as calculated using Bayes Theorem and the training dataset. Bayes theorem is based on the concept of conditional probabilities. If all the tools in the model are combined to predict genotoxicity for a chemical (labeled combinationi, where subscript i denotes a combination number), then the posterior probability of the chemical being genotoxic can be calculated as follows:

P(genotoxic|combinationi)=P(combinationi|genotoxic)*P(genotoxic)P(combinationi) Eq. 3

where,

P(combinationi|genotoxic)=N(combinationi|genotoxicN(genotoxic) Eq. 4
P(combinationi)=N(combinationi)i=1i=kN(combinationi) Eq. 5
P(genotoxic)=N(genotoxic)N(genotoxic)+N(nongenotoxic) Eq. 6

Using equations Eq. 46 and the fact that the total number of combinations is the same as the sum of genotoxic and non-genotoxic chemicals, Eq. 3 can be simplified as follows:

P(genotoxic|combinationi)=N(combinationi|genotoxicN(combinationi) Eq. 7

Once the training dataset has been used to compute the posterior probabilities for all the combinations, the posterior probability of the prediction combination for a new (test) chemical can be compared to a cut-off to derive a prediction. Multiple cut-off probability values (range, 0.25–0.60) were evaluated, and a cut-off value with the best performance metric for each combination was chosen as the decision cut-off value for eventual analysis. For example, if the combination performance was the best at a cut-off probability of 0.4, then 0.4 was chosen as the decision cut-off value for subsequent analysis. So, if a chemical with a prediction combination 3 had a posterior probability of 0.6, then it was classified as genotoxic because 0.6 is greater than the decision cut-off value of 0.4. This was performed within a 5-fold cross-validation scheme such that 80% of the dataset was used for derivation of probabilities, and the remaining 20% of the dataset was used to evaluate the accuracy of the Naïve Bayes ensemble model.

2.6. Single Assay Pair Analysis

As described earlier, the dataset compiled enabled an analysis of how often the outcomes from a single Ames and clastogen study resulted in a health-conservative call. The analysis was performed by sampling single pairs of assays from chemicals with multiple Ames and clastogen assays available, and for which the health-conservative call had been made based on the availability of multiple assays. The following steps were carried out:

  1. The dataset was filtered to extract chemicals with conclusive data in more than 2 Ames assays.

  2. For each chemical, one of the Ames assays was randomly sampled.

  3. If clastogen assay data were available for that chemical, then one clastogen assay was randomly sampled.

  4. If the sampled Ames assay outcome was positive, the chemical was categorized as genotoxic. If the sampled Ames assay outcome was negative, the clastogen assay outcome was evaluated. If the sampled clastogen assay outcome was positive, the chemical was categorized as genotoxic else non-genotoxic.

  5. For each chemical in the data set form step 1, Steps 2–4 were repeated 500 times, resulting in 500 replicates for the chemical’s genotoxicity call.

Based on this resampled data, the frequency of a health-conservative genotoxicity call could then be calculated.

2.7. Code and Data Availability

The software code for data analysis and model development was written in Python 3.8 [47,48]. The code is available on github https://github.com/g-patlewicz/genetox and the supplementary data files are available on the EPA FTP site under https://gaftp.epa.gov/Comptox/CCTE_Publication_Data/CCED_Publication_Data/PatlewiczGrace/CompTox-genetox/.

3. Results and Discussion

3.1. Genotoxicity classification

Figure 4 summarizes the distribution of chemical classification as genotoxic, non-genotoxic, and/or inconclusive based on gene mutation data from the Ames assay. There were 8422 substances out of the 9299 substances for which categorizations could be determined. The majority of substances (5328) were assigned as non-mutagenic, 2055 were categorized as Ames positive, 673 as clastogenic, 257 as non-clastogenic and 129 that were not conclusive. Approximately 14% of chemicals active in <50% of the Ames assays in which they were tested were categorized as genotoxic, and all chemicals with inconclusive Ames assay data were categorized as genotoxic because they had positive clastogenicity data in at least 1 Clastogen assay (see Table 4). Table 5 lists the distribution of the number of assays in which a chemical was tested. Note that more than 80% of the chemicals were tested in five or fewer Ames assays, whereas more than 80% of the chemicals were tested in two clastogen assays. This is largely anticipated since Ames tests are routinely conducted first in any testing battery, with other tests typically being performed as confirmatory.

Figure 4.

Figure 4.

Distribution of genotoxicity calls based on the categorization scheme for the 9299. Note only 8442 chemicals could be assigned given the availability of Ames and/or clastogenicity data.

Table 4.

Analysis of chemicals tested in the Ames assay and their classification as genotoxic, non-genotoxic, or inconclusive using the genotoxicity categorization scheme. Note that over 80% chemicals tested and active in less than 50% Ames assays are classified as genotoxic, highlighting the conservative nature of the classification scheme.

Ames Assay (% active, count)
Genotoxic Non-genotoxic Inconclusive
< 50% (6303) 14.2% (897) 84.5% (5328) 1.2% (78)
≥ 50% (1673) 100.0% (1673) 0% (0) 0% (0)
Inconclusive (82) 42.7% (35) 0% (0) 57.3% (47)

Table 5.

Characterization of number of chemicals tested in less than K (1, 2, 3, 4, 5 and 10) or fewer number of Ames or castogen assays. Note that over 60% chemicals were tested in a maximum of just 2 Ames assays, and >80% of chemicals were tested in a maximum of 2 clastogen assays.

Number of Assays Number of chemicals tested in Ames assays (%) Number of chemicals tested in Clastogen assays (%)
≤1 3582 (51.1%) 5384 (63.8%)
≤2 5191 (61.5%) 6852 (81.2%)
≤3 6512 (77.1%) 7288 (86.3%)
≤4 7088 (83.96%) 7745 (91.7%)
≤5 7378 (87.4%) 7865 (93.2%)
≤10 8129 (96.3%) 8270 (98.0%)

3.2. Evaluation of QSAR tools and structural alerts

The genotoxicity predictions from the QSAR tools and structural alerts were then evaluated against genotoxicity classifications using the categorization scheme. Predictive performance of individual tools was evaluated using accuracy, sensitivity, specificity, balanced accuracy, and the inter-rater agreement Kappa coefficient [44]. Table 6 shows the performance metrics for both validation criteria. In general, the QSAR models and structural alerts appeared to perform similarly on the basis of their reported balanced accuracies which ranged from 56.98–73.16% (mean balanced accuracy of 66.39%). It should also be noted that these balanced accuracies are inflated since no attempt had been made to restrict chemicals that might have been part of the underlying training sets. However, if the other metrics are considered, it is observed that only 3 models had higher sensitivities than their specificities. The majority of models had high specificities, reflecting higher true negative frequencies. Models T9, A1, A4 had particularly low sensitivities (24.32–36.38%) - T9 and A4 were models associated with clastogenicity predictions whereas A1 was a hybrid structural alert scheme for both Ames and clastogenicity. Curiously one of the other clastogenicity models within VEGA T8 performed better with higher sensitivity (79.86%) and specificity (51.05%).

Table 6.

Performance metrics of in silico tools and structural alerts against the genotoxicity categorization scheme. The number n in parenthesis refers to the number of chemicals that were predicted by the tool and had experimental data for comparison purposes.

Predictor Acc (%) Sens (%) Spec (%) BalAcc (%) Kappa PPV (%) NPV (%)
Genotoxicity Categorization scheme
EPA TEST T1 (n = 6645) 74.07 48.55 87.66 68.1 0.39 67.68 76.19
VEGA T2 (n = 4724) 76.44 63.34 82.98 73.16 0.47 65.04 81.92
VEGA T3 (n = 4724) 72.31 63.66 76.63 70.14 0.39 57.65 80.84
VEGA T4 (n = 4724) 71.87 58.07 78.76 68.42 0.37 57.74 78.99
VEGA T5 (n = 4724) 73.60 61.75 79.52 70.63 0.41 60.11 80.62
VEGA T6 (n = 4724) 75.61 61.25 82.79 72.02 0.45 64.01 81.04
VEGA T7 (n = 4724) 68.88 44.66 80.98 62.82 0.27 53.99 74.55
VEGA T8 (n = 4724) 60.65 79.86 51.05 65.45 0.26 44.91 83.53
VEGA T9 (n = 4724) 69.58 27.57 90.57 59.07 0.21 59.37 71.45
OECD A1 (n = 6179) 75.68 36.38 96.23 66.31 0.38 83.46 74.31
OECD A2 (n = 6179) 71.47 54.10 80.55 67.33 0.35 59.27 77.04
OECD A3 (n = 6179) 64.70 60.04 67.14 63.59 0.26 48.87 76.26
OECD A4 (n = 6179) 67.21 24.32 89.65 56.98 0.16 55.13 69.37
OECD A5 (n = 6179) 72.21 60.79 78.19 69.49 0.39 59.31 79.22
OECD A6 (n = 6179) 57.55 77.43 47.15 62.29 0.21 43.39 79.97
Ensemble Model (n = 4542)
[Combination scheme 2 of tools T1, T5, T8, A3]
76.86 87.24 75.20 81.22 0.39 36.03 97.35

Notes: Sens = Sensitivity, Spec = Specificity, Bal Acc = Balanced Accuracy, NPV = Negative Predictive Value, PPV = Positive Predictive Value

3.3. Performance analysis of the Ensemble Model

For any machine learning problem and indeed for ensemble modelling, it is typical to remove correlated descriptors. Figure 5 shows a heatmap of Pearson correlation coefficients between the QSAR models (T1-T9) and the structural alerts (A1-A6). The Pearson correlation test was used to identify tools that had a correlation coefficient (range: −1 to +1) greater than or equal to 0.8 for use in the ensemble model. A correlation of +1 means perfect correlation, and −1 means perfect negative correlation, with 0 denoting the absence of a relationship. As shown in Figure 5, most of these tools had low correlation with each other except tools T2, T6, T5 and A5. This demonstrated that although the models potentially overlap in their underlying training sets, there were differences in the underlying models.

Figure 5.

Figure 5.

Heatmap depicting correlation between different predictors (QSAR tools T1-T9, and OECD structural alerts A1-A6 as described in Table 2). Based on the coefficient of correlation (>=0.8), alert A5 and T6 were dropped from the ensemble model due to their high correlation. Models A1, A4 and T9 were dropped on account of their low sensitivities. The maximum number of tools considered in the ensemble model was 10 (7 QSAR and 3 alerts).

T2 and T6 were highly correlated with a correlation coefficient of 0.8. T2 represented the Consensus Ames model within VEGA, whereas T6 was the KNN/Read-across model within VEGA. A5 and T5 were also highly correlation with an even higher coefficient of 0.97 – the two models were the in vivo Micronucleus alerts by ISS as encoded in VEGA and the OECD Toolbox. Given the origin of these models is the same, it is not surprising that the coefficient is so high.

A5 and T6 were dropped as descriptors on account of their redundancy. The sensitivities of models A1, A4 and T9 were very low (24–36%), higher values are important to minimize false negatives and as such these models were also dropped as descriptors leaving a total of 10 tools (T1-T5, T7-T8, A2, A3 and A6) for ensemble model development. Various models were developed using combinations of subsets of these tools taking N tools at a time. The value of N was varied from 2 to 10 (the total number of tools) leading to a total of 1013 different models. Since each tool has a binary (genotoxic, 1 or non-genotoxic, 0) prediction, predictions from N tools can be combined in 2N different ways. Considering all the tools that are used in the model, the total number of combinations is 1024 (210).

The dataset was split into 5 folds such that 80% of the dataset was used for derivation of probabilities, and the remaining 20% of the dataset was used to evaluate the accuracy of the Naïve Bayes ensemble model. The ratio of non-genotoxic to genotoxic chemicals in the training sets were 1.9:1, whereas the ratio in the test sets was 1.80:1. The training dataset was used to develop combinations 1 and 2 as explained above using N (2–10) tools at a time leading to 1013 different models. If the various thresholds are considered, for each substance there were up to 18,234 different model predictions. Summary performance metrics for each test could be only be calculated for 16,994 of the 18,234 models due to division by zero errors. The complete set of metrics were filtered on the basis of each combination to retain models with the highest balanced accuracies and maximum prediction cut off value. This subset comprised 2025 models for consideration. For plotting purposes, the performance metrics on the test set for the first 100 models are shown in Figures 6 and 7. Figure 6 shows the classification metrics on for the models developed using Combination scheme 1, whereas Figure 7 shows the classification metrics for Combination scheme 2. In general, for all combinations, the models have accuracies and balanced accuracies ranging from 34.32–80% and 17.26–81.22% respectively. The mean balanced accuracy overall was 72.35%, with mean sensitivity and specificity of 65.98% and 78.74%. The mean balanced accuracy in Combination scheme 1 was 75.97% whereas it was slightly higher at 77.49% in Combination scheme 2. The mean sensitivity and specificity was 78.65% and 73.28% in the Combination scheme 1 whereas these metrics were 78.42% and 76.56% in the Combination scheme 2. Since the performance metrics were generally higher in Combination scheme 2, the ‘best’ model was selected from Combination scheme 2 on the basis of highest balanced accuracy and highest sensitivity. The selected model from Combination scheme 2 performed better than the individual tools at predicting the categorization outcome, with an accuracy of 76.86%, a balanced accuracy of 81.22%, a sensitivity of 87.24% and a specificity of 75.20%. The final classification metrics for all the combinations are provided as supplemental data (see metrics_combns.csv in the processed data directory of the associated FTP data file).

Figure 6.

Figure 6.

Performance metrics of first 100 ensemble models based on Combination scheme 1 compared against the genotoxicity categorization scheme. The combination with highest accuracy within this 100 is circled, and the balanced accuracy value (%) annotated in black with the cut-off value for selection is indicated in parenthesis. The mean balanced accuracy for the full set of models is reflected with a blue dotted line and 1 standard deviation above and below this mean balanced accuracy is reflected in red dotted lines.

Figure 7.

Figure 7.

Performance metrics of the first 100 ensemble models based on Combination scheme 2 compared against the genotoxicity categorization scheme. The combination with highest balanced accuracy in this set is circled, and the accuracy value (%) annotated in black with the cut-off value for selection is indicated in parenthesis. The mean balanced accuracy for all the models is reflected with a blue dotted line and 1 standard deviation above and below this mean balanced accuracy is reflected in red dotted lines.

3.4. Final Ensemble Genotoxicity Model

Based on the performance evaluation as presented in Section 3.3, the final model selected was an ensemble of base models EPA TEST (T1), Vega Mutagenicity (Ames test) model (ISS) (T5, Vega In vitro Micronucleus activity (IRFMN/VERMEER) (T8), DNA binding alerts by OASIS (A3) combined using the combination scheme 2. This model had the highest balanced accuracy, accuracy and sensitivity relative to the individual models. This was a particular consideration to limit false negatives. This model was evaluated on 4542 chemicals that had prediction data from all the relevant tools and that resulted in the best performance using a cut-off probability value of 0.55. The performance metrics of the final model are listed in Table 6. To generate a genotoxicity prediction for a new chemical, the predictions from the base models were used to calculate the combination number using combination scheme 2. If the posterior probability for the resultant combination was greater than the cut-off value of 0.55, then the chemical was classified as genotoxic; otherwise the chemical was classified as non-genotoxic. For combination numbers that were not present in the training data, a posterior probability could not be calculated, and any chemical resulting in that combination number was classified as out of domain. Table 7 presents a look-up table that can be used to classify a new chemical based on the final ensemble model.

Table 7.

Look-up table to derive final ensemble genotoxicity predictions using predictions EPA TEST (T1), VEGA (T5, T8), DNA binding alerts by OECD (A3) combined together using combination scheme 2. For each prediction combination, the possible combination numbers and observed prior probabilities are listed. For a new test chemical, if the posterior probability was greater than the cut-off value of 0.55, then the prediction outcome was genotoxic and non-genotoxic otherwise. If the combination number was not found in the training data, a posterior probability could not be calculated; thus, that combination number fell out of domain for predictions.

T1 T5 T8 A3 Combination_Number Posterior_Probability Prediction_Outcome
0 0 0 0 0 0.14 non-genotoxic
0 0 0 1 1 0.18 non-genotoxic
0 0 1 0 2 0.22 non-genotoxic
0 0 1 1 3 0.4 non-genotoxic
0 1 0 0 4 0.19 non-genotoxic
0 1 0 1 5 0.38 non-genotoxic
0 1 1 0 6 0.29 non-genotoxic
0 1 1 1 7 0.51 non-genotoxic
1 0 0 0 8 0.09 non-genotoxic
1 0 0 1 9 0.2 non-genotoxic
1 0 1 0 10 0.21 non-genotoxic
1 0 1 1 11 0.47 non-genotoxic
1 1 0 0 12 0.2 non-genotoxic
1 1 0 1 13 0.36 non-genotoxic
1 1 1 0 14 0.42 non-genotoxic
1 1 1 1 15 0.67 genotoxic

3.4. Results of Single Assay Analysis

There were 3184 chemicals that had experimental data from more than 2 conclusive Ames assays for which the single assay analysis could be run. Figure 8 shows the results of this analysis. 75% of chemicals produced the health-conservative genotoxicity call for every assay selection; this was because for these chemicals, almost all Ames and all Clastogen assay results were the same. There were a few chemicals where 25% or less of the time the health-conservative call would have been achieved with a single assay pair. These were the chemicals for which the number of negative assay results far outnumbered the number of positive assay results; however, these cases were in the minority.

Figure 8.

Figure 8.

Results of single assay analysis. The x-axis shows the percent of times a chemical would generate the health-conservative genotoxicity call with a single experimental assay, and the y-axis gives the fraction of chemicals with a specified percentage.

4. Conclusions

Risk-based prioritization for thousands of chemicals based on genotoxicity is a requirement for the US EPA under the amended TSCA. Here, we compiled a large dataset of experimental data from a range of public sources to evaluate a categorization scheme that relied on the presence of at least one Ames or Clastogen assay outcome to make an overall call for genotoxicity. The dataset comprised 9299 substances with results from a range of guideline and non-guideline studies. The dataset was curated to reassign misclassified study types and tagged into broad study types: Ames, Clastogen, ‘gene-mutation’ and Other. An assessment of quality of a small proportion of the dataset was performed as part of a separate study, although there were significant challenges in accessing some of the primary literature for the 238 substances considered (see Patlewicz et al, in prep).

There were 8442 chemicals with Ames and/or clastogenicity data that were evaluated in several ways. Exploration of this subset of the dataset showed that > 60% of chemicals were tested in a maximum of 2 Ames assays, and >80% of chemicals were tested in a maximum of 2 Clastogen assays. For the large majority of chemicals for which there were multiple Ames or Clastogen assays, the results agreed ~75% of the time across all of the assays (Figure 8). Although a large dataset was compiled, in practice the large number of chemicals requiring prioritization are data poor; hence, an approach to mimic the categorization scheme using in silico predictions was investigated. We have provided a simple scheme to use data from public structure-based tools to categorize chemicals as genotoxic or not (Section 3.4 and Table 7). In silico models and alerts of various types were used to generate predictions, and their performance was compared to the categorization scheme outcomes, either as individual models or as part of a Naïve Bayes ensemble model approach. The balanced accuracies of the individual models ranged from 57–73%, whereas the best performing combination comprised outcomes based on 3 QSAR models and 1 alert scheme had a balanced accuracy of 81%. As the training data for any of the in silico models were not factored into the assessment since such data was not forthcoming for the structural alert schemes, the true performance characteristics will be inflated. Nonetheless, this combination of in silico models offers a useful component of a scheme to prioritize chemicals for health effects research and/or for making regulatory decisions.

Highlights.

  • Genotoxicity data were compiled for over 9299 unique substances

  • A categorization scheme was developed to prioritize chemicals on the basis of their genotoxic potential

  • The predictive performance of a selection of (Q)SARs were evaluated against this categorization scheme

  • An ensemble in silico model was developed to mimic the categorization scheme

Acknowledgments:

This project was supported in part by an appointment to the Research Participation Program at the Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and EPA.

Footnotes

Publisher's Disclaimer: Disclaimer: The views expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.

References

  • 1.Administrator Memo Prioritizing Efforts to Reduce Animal Testing, September10, 2019. https://www.epa.gov/research/administrator-memo-prioritizing-efforts-reduce-animal-testing-september-10-2019.
  • 2.Sakuratani Y, Horie M, Leinala E. (2018). Integrated Approaches to Testing and Assessment: OECD Activities on the Development and Use of Adverse Outcome Pathways and Case Studies. Basic Clin Pharmacol Toxicol. 123Suppl 5, 20–28. doi: 10.1111/bcpt.12955. [DOI] [PubMed] [Google Scholar]
  • 3.Patlewicz G, Wambaugh J, Felter SP, Simon TW, Becker RA. (2018). Utilising Threshold of Toxicological Concern (TTC) with high throughput exposure predictions (HTE) as a risk based prioritization approach for thousands of chemicals Computational Toxicology 7, 58–67, 10.1016/j.comtox.2018.07.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Dearfield KL, Auletta AE, Cimino MC, Moore MM. (1991). Considerations in the U.S. Environmental Protection Agency’s testing approach for mutagenicity. Mutat Res. 258(3), 259–83. doi: 10.1016/0165-1110(91)90012-k. [DOI] [PubMed] [Google Scholar]
  • 5.Ashby J (1986). The prospects for a simplified and internationally harmonized approach to the detection of possible human carcinogens and mutagens. Mutagenesis. 1(1), 3–16. doi: 10.1093/mutage/1.1.3. [DOI] [PubMed] [Google Scholar]
  • 6.United Kingdom Committee on Mutagenicity of Chemicals in Food, Consumer Products and the Environment. Quantitative approaches to the assessment of genotoxicity data. (http://www.dh.gov.uk/assetRoot/04/07/71/96/04077196.pdf). Last assessed: April 2020.
  • 7.Eastmond DA, Hartwig A, Anderson D, Anwar WA, Cimino MC, Dobrev I, Douglas GR, Nohmi T, Phillips DH, Vickers C. (2009). Mutagenicity testing for chemical risk assessment: update of the WHO/IPCS Harmonized Scheme. Mutagenesis. 24(4), 341–349. doi: 10.1093/mutage/gep014. [DOI] [PubMed] [Google Scholar]
  • 8.OECD (2014). Guidance Document 116 on the Conduct and Design of Chronic Toxicity and Carcinogenicity Studies, Supporting Test Guidelines 451, 452 and 453: Second edition, OECD Series on Testing and Assessment, No. 116, OECD Publishing, Paris, 10.1787/9789264221475-en. [DOI] [Google Scholar]
  • 9.OECD (2015). Guidance Document on Revisions to OECD Genetic Toxicology Test Guidelines. https://www.oecd.org/chemicalsafety/testing/Genetic%20Toxicology%20Guidance%20Document%20Aug%2031%202015.pdf
  • 10.International Conference on Harmonisation: Genotoxicity Testing and Data Interpretation for Pharmaceuticals Intended for Human Use (2011). http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Safety/S2_R1/Step4/S2R1_Step4.pdf. [PubMed]
  • 11.NTP Genetic Toxicology. https://ntp.niehs.nih.gov/testing/types/genetic/index.html.
  • 12.Lloyd M, Kidd D. (2012). The mouse lymphoma assay. Methods Mol Biol. 817, 35–54. doi: 10.1007/978-1-61779-421-6_3. [DOI] [PubMed] [Google Scholar]
  • 13.Hayashi M (2016). The micronucleus test—most widely used in vivo genotoxicity test—. Genes and Environ 38, 18. 10.1186/s41021-016-0044-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Galloway SM, Aardema MJ, Ishidate M Jr, Ivett JL, Kirkland DJ, Morita T, Mosesso P, Sofuni T. (1994). Report from working group on in vitro tests for chromosomal aberrations. Mutat Res. 312(3), 241–261. doi: 10.1016/0165-1161(94)00012-3. [DOI] [PubMed] [Google Scholar]
  • 15.Mortelmans K, Zeiger E. (2000). The Ames Salmonella/microsome mutagenicity assay. Mutat Res. 455(1–2), 29–60. doi: 10.1016/s0027-5107(00)00064-6. [DOI] [PubMed] [Google Scholar]
  • 16.Williams RV, DeMarini DM, Stankowski LF Jr, Escobar PA, Zeiger E, Howe J, Elespuru R, Cross KP. (2019). Are all bacterial strains required by OECD mutagenicity test guideline TG471 needed? Mutation Research/Genetic Toxicology and Environmental Mutagenesis, 2019. 848: p. 503081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Serafimova R, Worth A, Fuart Gatnik M. (2012). Review of QSAR Models and software tools for predicting genotoxicity and carcinogenicity. Institute for Health and Consumer Protection (Joint Research Centre) European Commission. doi: 10.2788/2612323. [DOI] [Google Scholar]
  • 18.Bakhtyari NG, Raitano G, Benfenati E, Martin T and Young D (2013) Comparison of in silico models for prediction of mutagenicity. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev 31: 45–66. [DOI] [PubMed] [Google Scholar]
  • 19.Amberg A, Beilke L, Bercu J, Bower D, Brigo A, Cross KP, Custer L, Dobo K, Dowdy E, Ford KA, Glowienke S, Van Gompel J, Harvey J, Hasselgren C, Honma M, Jolly R, Kemper R, Kenyon M, Kruhlak N, Leavitt P, Miller S, Muster W, Nicolette J, Plaper A, Powley M, Quigley DP, Reddy MV, Spirkl HP, Stavitskaya L, Teasdale A, Weiner S, Welch DS, White A, Wichard J, Myatt GJ. (2016). Principles and procedures for implementation of ICH M7 recommended (Q)SAR analyses. Regul Toxicol Pharmacol. 77, 13–24. doi: 10.1016/j.yrtph.2016.02.004. [DOI] [PubMed] [Google Scholar]
  • 20.Cassano A, Raitano G, Mombelli E, Fernández A, Cester J, Roncaglioni A, Benfenati E. (2014). Evaluation of QSAR models for the prediction of ames genotoxicity: a retrospective exercise on the chemical substances registered under the EU REACH regulation. J Environ Sci Health C Environ Carcinog Ecotoxicol Rev. 2014;32(3):273–98. doi: 10.1080/10590501.2014.938955. [DOI] [PubMed] [Google Scholar]
  • 21.Benigni R, Bossa C. (2019). Data-based review of QSARs for predicting genotoxicity: the state of the art. Mutagenesis. 34(1), 17–23. doi: 10.1093/mutage/gey028. [DOI] [PubMed] [Google Scholar]
  • 22.Honma M (2020). An assessment of mutagenicity of chemical substances by (quantitative) structure–activity relationship. Genes and Environ 42, 23 (2020). 10.1186/s41021-020-00163-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.EFSA (2017). Guidance on the use of the weight of evidence approach in scientific assessments. EFSA Journal 15(8), 4971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Benfenati E, Chaudhry Q, Gini G, Dorne JL. (2019). Integrating in silico models and read-across methods for predicting toxicity of chemicals: A step-wise strategy. Environment International. 11, 105060. doi: 10.1016/j.envint.2019.105060. [DOI] [PubMed] [Google Scholar]
  • 25.OECD (2020). Test No. 471: Bacterial Reverse Mutation Test, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, 10.1787/9789264071247-en. [DOI] [Google Scholar]
  • 26.OECD (2016), Test No. 476: In Vitro Mammalian Cell Gene Mutation Tests using the Hprt and xprt genes, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, 10.1787/9789264264809-en. [DOI] [Google Scholar]
  • 27.OECD (2016), Test No. 473: In Vitro Mammalian Chromosomal Aberration Test, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, 10.1787/9789264264649-en. [DOI] [Google Scholar]
  • 28.OECD (2016), Test No. 487: In Vitro Mammalian Cell Micronucleus Test, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, 10.1787/9789264264861-en. [DOI] [Google Scholar]
  • 29.OECD (2016), Test No. 474: Mammalian Erythrocyte Micronucleus Test, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, 10.1787/9789264264762-en. [DOI] [Google Scholar]
  • 30.OECD (2016), Test No. 475: Mammalian Bone Marrow Chromosomal Aberration Test, OECD Guidelines for the Testing of Chemicals, Section 4, OECD Publishing, Paris, 10.1787/9789264264786-en. [DOI] [Google Scholar]
  • 31.Grulke CM, Williams AJ, Thillanadarajah I, Richard AM. (2019). EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research. Computational Toxicology, 12, 100096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Richard AM and Williams CR, Distributed structure-searchable toxicity (DSSTox) public database network: a proposal. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 2002. 499(1): p. 27–52. [DOI] [PubMed] [Google Scholar]
  • 33.Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, Patlewicz G, Shah I, Wambaugh JF, Judson RS, Richard AM. (2017). The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform. 2017 9(1):61. doi: 10.1186/s13321-017-0247-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Young D, Martin TM, Venkatapathy R, Harten P.(2008). Are the Chemical Structures in Your QSAR Correct? QSAR & Combinatorial Science 27(11):1337–1345. [Google Scholar]
  • 35.EPA Toxicity Estimation Software Tool (TEST). https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test.
  • 36.Benfenati E, Manganaro A, Gini G. (2013). VEGA-QSAR: AI inside a platform for predictive toxicology. Proceedings of the Workshop ‘Popularise Artificial Intelligence 2013. CEUR Workshop Proceedings; Vol 1107. [Google Scholar]
  • 37. https://www.vegahub.eu/about-vegahub/
  • 38.The OECD QSAR Toolbox. https://www.oecd.org/chemicalsafety/risk-assessment/oecd-qsar-toolbox.htm.
  • 39.Schultz TW, Diderich R, Kuseva CD, Mekenyan OG. The OECD QSAR Toolbox Starts Its Second Decade. Methods Mol Biol. 2018;1800:55–77. doi: 10.1007/978-1-4939-7899-1_2. [DOI] [PubMed] [Google Scholar]
  • 40.Hansen K, Mika S, Schroeter T, Sutter A, ter Laak A, Steger-Hartmann T, Heinrich N, Müller KR. Benchmark data set for in silico prediction of Ames mutagenicity. J Chem Inf Model. 2009. September;49(9):2077–81. doi: 10.1021/ci900161g. [DOI] [PubMed] [Google Scholar]
  • 41.Mekenyan OG, Dimitrov SD, Pavlov TS, Veith GD. A systematic approach to simulating metabolism in computational toxicology. I. The TIMES heuristic modelling framework. Curr Pharm Des. 2004;10(11):1273–93. doi: 10.2174/1381612043452596. [DOI] [PubMed] [Google Scholar]
  • 42.Serafimova R, Todorov M, Pavlov T, Kotov S, Jacob E, Aptula A, Mekenyan O. Identification of the structural requirements for mutagencitiy, by incorporating molecular flexibility and metabolic activation of chemicals. II. General Ames mutagenicity model. Chem Res Toxicol. 2007April;20(4):662–76. doi: 10.1021/tx6003369. Epub 2007 Mar 24. Erratum in: Chem Res Toxicol. 2007 Aug;20(8):1225. [DOI] [PubMed] [Google Scholar]
  • 43.Benigni R, Bossa C, Richard AM, Yang C. A novel approach: chemical relational databases, and the role of the ISSCAN database on assessing chemical carcinogenicity. Ann Ist Super Sanita. 2008;44(1):48–56. [PubMed] [Google Scholar]
  • 44.Cohen J, A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 2016. 20(1): p. 37–46. [Google Scholar]
  • 45.Kuncheva LI and Rodríguez JJ, A weighted voting framework for classifiers ensembles. Knowledge and Information Systems, 2012. 38(2): p. 259–275. [Google Scholar]
  • 46.Pradeep P, Povinelli RJ, White S, Merrill SJ. An ensemble model of QSAR tools for regulatory risk assessment. J Cheminform. 2016September22;8:48. doi: 10.1186/s13321-016-0164-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Python Software Foundation. Python Language Reference, version 3.8. Available at http://www.python.org.
  • 48.Pedregosa F, et al. , Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 2011. 12: p. 2825–2830. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The software code for data analysis and model development was written in Python 3.8 [47,48]. The code is available on github https://github.com/g-patlewicz/genetox and the supplementary data files are available on the EPA FTP site under https://gaftp.epa.gov/Comptox/CCTE_Publication_Data/CCED_Publication_Data/PatlewiczGrace/CompTox-genetox/.

RESOURCES