Abstract
The Toxic Substances Control Act (TSCA) became law in the U.S. in 1976 and was amended in 2016. The amended law requires the U.S. EPA to perform risk-based evaluations of existing chemicals. Here, we developed a tiered approach to screen potential candidates based on their genotoxicity and carcinogenicity information to inform the selection of candidate chemicals for prioritization under TSCA. The approach was underpinned by a large database of carcinogenicity and genotoxicity information that had been compiled from various public sources. Carcinogenicity data included weight-of-evidence human carcinogenicity evaluations and animal cancer data. Genotoxicity data included bacterial gene mutation data from the Salmonella (Ames) and Escherichia coli WP2 assays and chromosomal mutation (clastogenicity) data. Additionally, Ames and clastogenicity outcomes were predicted using the alert schemes within the OECD QSAR Toolbox and the Toxicity Estimation Software Tool (TEST).
The evaluation workflows for carcinogenicity and genotoxicity were developed along with associated scoring schemes to make an overall outcome determination. For this case study, two sets of chemicals, the TSCA Active Inventory non-confidential portion list available on the EPA CompTox Chemicals Dashboard (33,364 chemicals, ‘TSCA Active List’) and a representative proof-of-concept (POC) set of 238 chemicals were profiled through the two workflows to make determinations of carcinogenicity and genotoxicity potential. Of the 33,364 substances on the ‘TSCA Active List’, overall calls could be made for 20,371 substances. Here 46.67%% (9507) of substances were non-genotoxic, 0.5% (103) were scored as inconclusive, 43.93% (8949) were predicted genotoxic and 8.9% (1812) were genotoxic. Overall calls for genotoxicity could be made for 225 of the 238 POC chemicals. Of these, 40.44% (91) were non-genotoxic, 2.67% (6) were inconclusive, 6.22% (14) were predicted genotoxic, and 50.67% (114) genotoxic. The approach shows promise as a means to identify potential candidates for prioritization from a genotoxicity and carcinogenicity perspective.
Keywords: Toxic Substance Control Act, TSCA, Genotoxicity, Mutagenicity, Carcinogenicity
1. Introduction
1.1. Background
The Toxic Substances Control Act (TSCA) was passed by Congress and signed into law by the President in 1976 and has been a central feature of chemical regulatory authority of the U.S. Environmental Protection Agency (U.S. EPA) ever since [1]. The history and impact of this legislation have been described elsewhere [2]. TSCA 1976 authorized the regulation of potential environmental and health risks of chemicals in U.S. commerce based on three policies: (1) adequate information should be developed with respect to the effect of chemical substances and mixtures on health and the environment and that the development of such information should be the responsibility of those who manufacture and those who process such chemical substances and mixtures; (2) adequate authority should exist to regulate chemical substances and mixtures which present an unreasonable risk of injury to health or the environment, and to take action with respect to chemical substances and mixtures which are imminent hazards; and (3) authority over chemical substances and mixtures should be exercised in such a manner as not to impede unduly or create unnecessary economic barriers to technological innovation while fulfilling the primary purpose of this chapter to assure that such innovation and commerce in such chemical substances and mixtures do not present an unreasonable risk of injury to health or the environment (https://www.epa.gov/laws-regulations/summary-toxic-substances-control-act).
Under TSCA, EPA is assigned the role of protecting against unreasonable risks to health and the environment from chemicals. Within EPA, the Office of Pollution Prevention and Toxics (OPPT) carries out much of this important work. By the 1970s there were many genotoxicity assays under development, culminating in >200 assays across a wide variety of organisms by the 1980s [3]. The Office of Research and Development (ORD) of the U.S. EPA initiated the EPA Gene-Tox Program in 1979, which involved 196 scientists worldwide who were tasked with evaluating the literature on a reduced number (23) of selected genotoxicity assays, resulting in 36 review articles [4]. These reviews formed the basis for the subsequent genotoxicity test guidelines of the U.S. EPA and the U.S. Food and Drug Administration (U.S. FDA) [4]. Many chemicals currently in commerce have not been adequately assessed for toxicity, including genotoxicity and carcinogenicity, leaving an opening for scientific progress to now address this data gap.
Among the many mandates, the amended law requires the U.S. EPA to (1) evaluate existing chemicals under enforceable deadlines, (2) make risk-based evaluations, and (3) increase public transparency of chemical information. The amended law also requires that there be consistent sources of funding for the U.S. EPA to perform a number of additional responsibilities [5].
1.2. Risk-based prioritization
Under Section 6(b) of amended TSCA, the U.S. EPA is required to prioritize existing chemical substances for risk evaluation. ‘Prioritization’ is a public process with deadlines in which the U.S. EPA is required to designate chemical substances as either high-priority or low-priority for risk evaluation. TSCA includes deadlines for completing risk evaluations for high-priority chemicals and then designating additional chemicals as high-priority [6].
The Office of Research and Development (ORD) of the U.S. EPA developed a tiered approach to facilitate the assessment of available hazard, exposure, persistence and bioaccumulation information for thousands of chemicals to provide information that could assist U.S. EPA in identifying potential chemicals as candidates for high- or low-priority for risk evaluation [7, 8]. One of the workflows considered genotoxicity and carcinogenicity data. The tiered approach took advantage of the enormous amount of research conducted on genotoxicity and carcinogenicity since TSCA became law in 1976, as well as a large dataset of experimental and predicted genotoxicity information that had been compiled.
This manuscript outlines the different information streams used, the rationale for their section, how they were scored, and how they were combined into a tiered workflow to facilitate subsequent evaluation of the two sets of chemicals. One set of chemicals consisted of 33,364 substances (the TSCA Active Inventory non-confidential portion (updated March 20th, 2020) that is available on the EPA CompTox Chemicals Dashboard [9] referred to herein as the ‘TSCA Active List’. The second list was a smaller subset of 238 chemicals, which was selected as a Proof-of-Concept (POC) set that balanced various considerations, including chemical diversity and other information sources (exposure and other hazard endpoints) beyond carcinogenicity and genotoxicity. The genesis of the POC set is described in more detail in two associated EPA reports, a white paper [7], and the final report [8]. In brief, the POC set were selected from the non-confidential TSCA active inventory (the ‘TSCA Active List’) and included the initial proposed set of 20 high priority and 20 low priority candidate substances, substances from the 2014 update to the TSCA workplan [10], a subset of substances listed in the FDA Substances added to Food inventory as well as the EPA’s Safer Choice Safer Chemical Ingredients List (SCIL). Although the different information streams are described for carcinogenicity herein, this manuscript has focused on the genotoxicity workflow since this relied on both experimental and in silico data.
2. Material and methods
2.1. Information streams considered in the tiered workflow approach
2.1.1. Carcinogenicity information
Distinction between rodent/human carcinogens
The ability of an agent to cause cancer (carcinogenicity) in humans is determined by a weight-of-evidence assessment that typically includes exposure, epidemiological, animal cancer, and mechanistic data. For example, the International Agency for Research on Cancer (IARC), which is part of the World Health Organization (WHO), convenes working groups to perform carcinogenicity weight-of-evidence assessments [11]. Although most IARC Group 1 (known) human carcinogens are also carcinogenic in animals, 34% (38/111) have no animal tumor sites specified because there is insufficient or no animal cancer data, among other reasons [12,13].
Conversely, not all rodent carcinogens are necessarily human carcinogens. For example, some specific types of tumors in rodents are associated with tissues or biological pathways that do not occur in humans. Only 25% (1250/5000) of chemicals evaluated for rodent carcinogenicity by the U.S. National Toxicology Program (NTP) are carcinogenic in both rats and mice [14]. Carcinogenicity in more than one rodent species is a feature exhibited by many Group 1 human carcinogens [12,13] and is considered to reflect a category of rodent carcinogen that is likely to be a human carcinogen [15]. In the absence of a comprehensive weight-of-evidence assessment, evidence of carcinogenicity in humans can be extrapolated from animal (typically rodent) carcinogenicity studies. In this study, chemicals for which there is evidence that they cause cancer in humans were distinguished from those for which there is carcinogenicity data only in animals.
2.1.2. Chemical dataset: carcinogenicity data
Weight-of-evidence evaluations from the following sources as authoritative assessments of the potential of agents to cause cancer in humans were considered: (a) IARC [11], (b) the Integrated Risk Information System (IRIS) of the U.S. EPA [16], (c) the Office of Pesticide Programs of the U.S. EPA [17], (d) the Provisional Peer Reviewed Toxicity Values for Superfund of the U.S. EPA [18], the California Environmental Protection Agency [19], (e) the Report on Carcinogens (RoC) of the U.S. NTP [20], (f) the Federal Contaminated Site Risk Assessment of Health Canada [21], and (g) the U.S. National Institute for Occupational Safety and Health (NIOSH) [22].
The cancer determination from these authoritative assessments regarding the potential of agents to cause cancer in humans was accepted as the most up-to-date ‘ultimate decision’ on the carcinogenicity of agents to humans. In situations where multiple assessments may have arrived at different cancer determinations, the most health protective evaluation was selected. No further assessment of the primary data used in these assessments were performed as part of the tiered workflow.
In the absence of an authoritative assessment of the ability of an agent to cause cancer in humans as described above, the ability of the agent to cause cancer in animals (typically in rodents) derived from the peer-reviewed literature was used as an indication of the carcinogenic potential of an agent. Specifically, data both from in vivo cancer studies, some of which may not have performed a full weight-of-evidence analysis or determined from an oral slope factor or an inhalation unit risk, as well as from assessments that report on the ability of an agent to cause cancer without derivation of quantitative toxicity values were considered.
Acceptable rodent bioassays ranged from repeat-dose chronic assays to those that incorporate the standards established by authoritative organizations such as the NTP, which generally requires the evaluation of the agent in 50 male and 50 female rats, as well as 50 male and 50 female mice, exposed for at least 2 years to at least 3 doses, one of which is the maximum tolerated dose [23]. These data were used in a semi-quantitative fashion in the tiered workflow. The potency of the carcinogen was not considered, just the qualitative evidence of carcinogenicity (yes or no).
For carcinogenicity, the most notable in silico model available is OncoLogic™ [24], which is an expert system that provides a concern level for carcinogenicity based on structure and exposure information. A concern level is reported as an outcome together with a detailed rationalization of the alert highlighted. The concern level can differ depending on the route of exposure for the substance under consideration. Although OncoLogic™ is perhaps the most comprehensive publicly available system for carcinogenicity prediction in silico, its current software infrastructure does not facilitate large-scale batch processing of substances, (even the latest version 9 released January 2021 does not permit batch processing); thus, no in silico carcinogenicity tools were used in the carcinogenicity workflow.
2.1.3. Carcinogenicity workflow evaluation
The process for assigning scores for chemicals for carcinogenicity are presented in Figure 1 and Table 1. Chemicals with no information regarding their carcinogenicity were given a value of 0; chemicals with evidence of non-carcinogenicity in animals or humans were given a value of 1; chemicals determined to cause cancer in animals but that had not otherwise been assessed for their ability to cause cancer in humans were given a value of 2; chemicals determined to have evidence as possible or probable human carcinogens a value of 3 and chemicals with evidence of known human carcinogenicity determined by an authoritative source were given a value of 4.
Figure 1.

Tiered evaluation process associated with the carcinogenicity domain. The workflow begins with the determination of human carcinogenicity from an authoritative source and ends with one of the dashed-line boxes. Solid-line boxes represent intermediate decision points. IG is the information-gathering flag.
Table 1.
Criteria used to calculate the carcinogenicity domain metrica
| Value | Carcinogenicity determination | 
|---|---|
| 
 | |
| 0 | No available data for carcinogenicity | 
| 1 | Evidence of non-carcinogenicity in animals or humans | 
| 2 | Evidence for animal carcinogenicity but no assessment for human carcinogenicity | 
| 3 | Evidence as a possible or probable human carcinogen | 
| 4 | Known human carcinogen | 
Information gathering (IG) flags for predicted data and conflicting data in the same assay
2.1.4. Genotoxicity information
Definition of genotoxicity
Genotoxicity is distinctly different from carcinogenicity. Genotoxicity refers to the ability of agents to induce DNA damage, such as DNA strand breaks or DNA adducts (where an agent is bound covalently to DNA), as well as to agents that can induce mutations, i.e., heritable changes in DNA sequence or numbers of chromosomes (aneuploidy) [25]. An agent is considered genotoxic if it can induce DNA damage and/or mutation. DNA damage does not necessarily lead to DNA mutations, especially if the DNA damage is repaired prior to DNA replication. Some assays detect DNA damage, such as the comet assay, whereas others detect mutation, such as the Salmonella (Ames) mutagenicity assay in bacteria.
Genotoxicity is a key characteristic of many IARC Group 1 human carcinogens [26], but not all human carcinogens display genotoxic modes of action [13]. Likewise, numerous assessments have shown that, depending on the chemical classes and numbers of compounds in the dataset, as well as the genotoxicity assay or combinations of assays considered, ~10–40% of rodent carcinogens are not known to be genotoxic [27–33]. Non-genotoxic carcinogens are a less prevalent but important category of rodent and human carcinogens, causing cancer by a variety of mechanisms such as altering binding and/or modification of receptors, altering gene expression, inhibiting gap junction intracellular communication [34,35]. Since genotoxicity alone is insufficient for considering an agent carcinogenic, carcinogenicity and genotoxicity have been considered as distinct categories for the purposes of this study.
As discussed above, because most human and rodent carcinogens are genotoxic; genotoxicity has been considered an indicator of potential carcinogenicity [25,35]. This has been demonstrated most clearly for the Salmonella (Ames) mutagenicity assay, where a mutagen should be considered a potential rodent carcinogen; however, one cannot conclude that a non-mutagen is not a rodent carcinogen [32,33]. For screening purposes, in the absence of either human or rodent cancer data, genotoxicity may be considered an indicator of potential carcinogenicity; however, this must be done with the awareness that genotoxicity is just one of many factors considered when assessing an agent for its ability to cause cancer in humans [11,26].
Rationale for the selection of specific genotoxicity assays
Since the implementation by the U.S. EPA of the initial TSCA, many studies have assessed which combinations of genotoxicity tests are best at detecting rodent carcinogens [36], resulting in the testing schemes recommended by the Organization for Economic Cooperation and Development (OECD) Genetic Toxicology Test Guidelines [37], the International Conference on Harmonization [38], and the NTP [23].
Most regulatory bodies in the U.S., such as the U.S. EPA and U.S. FDA, recommend the OECD Genetic Toxicology Test Guidelines, which include a set of bacterial assays for gene mutation using various strains of Salmonella (the Ames strains) and several strains of Escherichia coli WP2, as well as several assays for chromosomal mutation (also called clastogenicity); these include in vitro chromosome aberration, mouse bone-marrow micronucleus, and the mouse lymphoma Tk+/− assays [37]. The rationale for this combination of assays is that some genotoxic agents produce primarily gene mutation, whereas others produce primarily chromosomal mutation (clastogenicity); most produce both [25]. A small number produce aneuploidy (chromosome gain or loss), which is also detected by the chromosome aberration or micronucleus assays.
Although carcinogenicity assessments are not based solely on genotoxicity assays, genotoxicity can sometimes be used as a predictor of carcinogenicity. However, there is little evidence that any combination of these assays is more predictive of rodent carcinogenicity than the Salmonella (Ames) mutagenicity alone [32,33]. That is, adding mammalian cell or rodent assays for chromosomal mutation (clastogenicity) was no more predictive of rodent carcinogenicity than the Ames assay alone.
A recent analysis using a database of >10,000 compounds found that just two of the five OECD-recommended bacterial strains of Salmonella (TA98 and TA100) identified 93% of the mutagens identified by all five recommended bacterial strains, and they identified 99% when the assays for chromosomal mutation were included [39]. A previous analysis in which 100 chemicals chosen randomly from among 65,725 chemicals in commercial use were tested in TA98 and TA100 found that 22% were mutagenic, providing an approximation of the number of untested organic compounds in commercial use that might be mutagenic [40].
Based on the considerations above, which include numerous comparative analyses performed during the past 45 years by a wide range of national and international agencies, institutions, and organizations [27–32], we considered that the genotoxicity of an agent could be reasonably assessed by evaluating data as recommended by OECD [37]. This evaluation includes the standard bacterial mutation assays in E. coli WP2 and the Ames strains of Salmonella, as well as the main assays for chromosomal mutation (in vitro chromosome aberration, in vivo/in vitro micronucleus, and mouse lymphoma Tk+/− assays). A chemical was considered genotoxic if it was positive in any of these assays.
2.1.5. Chemical dataset: genotoxicity experimental data
The dataset used in this study was compiled from public sources of experimental genotoxicity information as discussed in reference [41]. This dataset was subsequently filtered to retain only results for standardized genotoxicity assays.
The public sources included:
COSMOS [http://www.cosmostox.eu/what/COSMOSdb/], a collection of experimental data on chemical hazard primarily for cosmetics ingredients.
TOXNET (CCRIS and GENE-TOX) which includes unsupported databases of genotoxicity data from the National Library of Medicine that can be indirectly accessed through PubChem [https://www.nlm.nih.gov/toxnet/index.html].
eChemPortal [https://www.echemportal.org/echemportal/], a portal managed by OECD that contains information from many different Member Country Agencies. eChemPortal includes information extracted from EU REACH dossiers.
the National Toxicology Program (NTP) bioassay genetox conclusion dataset (https://doi.org/10.22427/NTP-DATA-022–00002-0002–000-8) which provides treatment related findings including bioassay genetox conclusions from chronic bioassay level of evidence, bacterial mutagenicity, micronucleus, Tox21 and comet assay.
EURL ECVAM Genotoxicity and Carcinogenicity consolidated Database of Ames Positive Chemicals (https://ec.europa.eu/jrc/en/scientific-tool/eurl-ecvam-genotoxicity-and-carcinogenicity-consolidated-database-ames-positive-chemicals), a structured database that compiles available genotoxicity and carcinogenicity data for Ames positive chemicals originating from a number of different sources.
The starting dataset used, which was a compilation of all these sources, consisted of 54,805 records for 9299 unique substances, as identified by their DSSTox substance identifier, DTXSID [42]. A mapping was performed to standardize terminology and ensure assays were not misclassified. Assays were then aggregated into four broad categories to classify them as either ‘Ames’, ‘clastogen’, ‘gene-mutagen’, or ‘other’. ‘Ames’ made reference to the standard bacterial mutation assays in E. coli WP2 and Salmonella, whereas ‘clastogen’ was any in vitro or in vivo chromosomal mutation assay (e.g. in vitro micronucleus or in vivo micronucleus assay, in vitro chromosomal aberration test and mouse lymphoma test). ‘Gene-mutagen’ typically comprised studies where the OECD TG 476 in vitro mammalian gene mutation test using Hprt and xprt genes [43] had been performed. ‘Other’ captured any study type that was considered non standard; using species such as yeast, fungi or plants such as barley. The dataset (filename: ‘genetox_merged_110221_final.xlsx’) is provided as part of the supporting data on the EPA FTP site: https://gaftp.epa.gov/Comptox/CCTE_Publication_Data/CCED_Publication_Data/PatlewiczGrace/CompTox-tsca.
2.1.6. Chemical dataset: In Silico Approaches
In the absence of experimental genotoxicity data, in silico approaches for identifying potential genotoxicity outcomes, namely (Quantitative) Structure Activity Relationships (Q)SARs as well as expert systems [44] were considered. Expert systems are formalized systems that implement QSARs, SARs, or both. QSARs are underpinned by the principle that the molecular structure and subsequent molecular properties of a chemical define how that chemical interacts with a specific biological system. QSAR approaches use a formalized quantitative method of characterizing the relationship between chemical structure and activity, whereas structural alerts (SARs) are based on a qualitative relationship in which a structural moiety is associated with activity [44].
There are many (Q)SAR models available in the literature for predicting genotoxic endpoints [45–47]. Several of these are local models whereby a relationship has been established between a congeneric set of substances, such as aromatic amines, and an endpoint, such as mutagenicity in the Salmonella (Ames) mutagenicity assay. There are also many global models for different genotoxic endpoints that are underpinned by heterogenous datasets. Most of the global models available are tailored to predict results in the Ames assay. A handful have been developed to address other genotoxic study types, such as the in vitro chromosome aberration assay, but many of these models are often commercial in nature [45–47].
In the tiered approach, mutagenicity was predicted using two publicly available software tools: the OECD QSAR Toolbox v4.4 [48] and the U.S. EPA’s Toxicity Estimation Software Tool [49]. The OECD QSAR Toolbox [50] is a software tool designed to facilitate the development, evaluation, justification, and documentation of chemical categories for data-gap filling approaches such as read-across. These profilers are intended to facilitate the grouping of chemicals for data-gap filling rather than to make predictions of specific effects themselves. In this study, the profilers provided a means of identifying features that were qualitatively associated with genotoxicity related effects.
The specific profilers used from the OECD QSAR Toolbox v4.4 [50] were the ‘DNA alerts for AMES, CA and MNT by OASIS’, ‘DNA binding by OASIS’, ‘in vitro mutagenicity (Ames) alerts by the Instituto Superiore di Sanità (ISS)’ in Italy, ‘in vivo mutagenicity (micronucleus) alerts by ISS’, ‘protein binding alerts for chromosome aberration by OASIS’, and ‘DNA binding by OECD’. The alerts are categorized into different types: the first four are known as endpoint-specific profilers, whereas DNA binding by OECD or OASIS are known as general mechanistic profilers. The latter comprises alerts that capture the reaction chemistry associated with genotoxicity based on the underlying hypothesis that the electrophilic potential of a chemical is associated with genotoxic properties. The general mechanistic profilers tend to comprise a larger number of alerts than their associated endpoint-specific profilers because these alerts are substantiated and supported by experimental data. For example, an alert in the endpoint-specific category will be supported by empirical genotoxicity data, whereas an expert-driven hypothesis may be based on the chemistry reasoning alone.
The outcomes from the alerts from the OECD QSAR Toolbox in most cases describe the alert name and the reaction domain, whether that be formation of a Schiff base, bimolecular nucleophilic substitution (SN2), Michael addition, radical, etc. The outcomes from the ISS profilers provide an alert name alone. For the purposes of this study, the alert names and descriptions were transformed into a binary representation to indicate the presence or absence of an alert. Among the structural profilers selected, two were tailored specifically towards alerts for clastogenicity, namely protein-binding alerts for chromosome aberration by OASIS, and in vivo mutagenicity (micronucleus) alerts by ISS.
The TEST model developed by the U.S. EPA includes a module to predict the overall outcome of the Ames assay. TEST QSAR models were shown to yield favorable predictions compared to other QSAR models in the literature [45]. The genotoxicity models in TEST were based on a dataset of Salmonella (Ames) mutagenicity results compiled by Hansen et al. [46]. TEST includes QSAR approaches such as hierarchical clustering, group contribution, and nearest-neighbor methods. TEST utilizes two-dimensional molecular descriptors as independent variables for the QSAR models. The recommended method for TEST is the consensus method, which calculates the average of the predictions from the other methods. This method has been shown to result in the best prediction coverage (fraction of chemicals that can be predicted) and prediction accuracy as detailed in the TEST User’s Guide [49]. Only the consensus model outcomes were carried forward into the tiered workflow. The dependent variable in the QSAR models is a binary (yes = 1, no = 0) mutagenicity determination though the predicted value can range between 0 and 1. Predicted values that were greater than or equal to 0.5 were assumed to be positive for mutagenicity, and values less than 0.5 were assumed to be negative.
2.1.7. Chemical dataset: Chemical structures
Chemical structures were extracted from the EPA CompTox Chemicals Dashboard [9, 42] by sending all the chemicals from the ‘TSCA Active List’ to the batch search to retrieve their QSAR-Ready SMILES. A text file of QSAR Ready SMILES and DTXSID identifiers tagged as names was imported into the OECD Toolbox in order to process the substances through the various profilers. For TEST, structure data files (SDFs) were created and processed in batches using the windows based command line version of the tool. The results from both tools (saved as text files) were transformed to match the experimental dataset structure. The predictions file comprised 135,129 records for 19,623 individual substances. Predictions were then appended to the experimental dataset, already described in section 2.1.5, to create one combined dataset of 189,934 records for 23,345 individual substances.
2.1.8. Genotoxicity categorization and evaluation
The workflow approach comprised the following steps: 1) studies were first grouped by chemical and one of the four broad categories. If a positive Ames was found, then the chemical was categorized as ‘genotoxic’. If a positive clastogenicity assay was found, then the chemical was categorized as ‘clastogen’. Chemicals with reported inconclusive experimental data were categorized accordingly. If no positive results were found, but a chemical was found to be evaluated as negative in experimental studies, it was categorized as ‘non-genotoxic’ or ‘non-clastogenic’ respectively. If none of these conditions were met for experimental studies, the same logic was then applied to the in silico data which were either categorized as ‘predicted genotoxic’ or ‘predicted clastogenic’. If no predicted positive result was found, then a chemical was determined to be negative based on in silico models and categorized as ‘predicted non genotoxic’ or ‘predicted non-clastogenic’. The overall categorizations were then converted into the scores detailed in Table 2 and as shown in Figure 2. If no experimental data was available for a chemical and no prediction could be made, a score of 0 was assigned so that the lack of information was accounted for in the scoring on information availability. Non genotoxic or non-clastogenicity (predicted/experimental) were assigned a 1; substances with inconclusive experimental results were categorized accordingly and scored a 2; predicted genotoxic/clastogenicity a 3 and experimental genotoxicity/clastogenicity outcomes scored a 4. An IG flag represented as a separate overall call would inform the end-user that the overall outcomes had been based on solely predicted data. In practice, this would be most relevant for negative predictions. Predictions from in silico tools were prefixed with a “p” for predicted. For these screening purposes, no attempts were made to evaluate the quality and design of the studies, and the information gathering (IG) flag was intended to provide the end-user the opportunity to evaluate the weight-of-evidence in situations that were deemed inconclusive. The workflow was applied to two datasets: 1) the ‘TSCA Active List’, and 2) a subset of the 238 proof-of-concept chemicals (POC) (see [8]).
Table 2.
Criteria used to calculate the genotoxicity domain metrica
| Value | Carcinogenicity determination | 
|---|---|
| 
 | |
| 0 | No available data for genotoxicity | 
| 1 | Evidence of non-genotoxicity (predicted and/or measured data) | 
| 2 | Inconclusive evidence of genotoxicity | 
| 3 | Evidence of genotoxicity based on predicted data | 
| 4 | Evidence of genotoxicity based on measured data | 
Information gathering (IG) flags for predicted data and conflicting data in the same assay.
Figure 2.

Tiered evaluation process associated with the genotoxicity domain. The process determines the gene and chromosomal (clastogenicity) mutagenicity of a chemical. IG is the information gathering flag.
2.1.9. Code and data availability
The software code for the data analysis was written in Python 3.7. The code is available on github at https://github.com/g-patlewicz/genetox-tsca and the supplementary data files are available on the EPA FTP site under https://gaftp.epa.gov/Comptox/CCTE_Publication_Data/CCED_Publication_Data/PatlewiczGrace/CompTox-tsca.
3. Results and Discussion
3.1. Genotoxicity experimental dataset
There were 54,805 records for 9299 unique substances. There were 25,111 (45.82%) records corresponding with Ames results, 14,396 (26.26%) with clastogenicity results, 5271 (9.61%) with gene-mutation and 10,027 (18.29%) records were classified as ‘other’.
Figure 3 provides a snapshot view of the number of studies per chemical across the 4 broad categories for the first 50 records in the dataset. In most cases, there is at least 1 if not more Ames studies. Figure 4 presents this information across the entire dataset to provide a perspective of how many studies a given chemical might have of a given study type. The histograms (Figure 4) show the frequency that a substance has 1 or more Ames, clastogen, gene-mutation or other study. For Ames studies, ~40% (3198) of substances have a single study, whereas ~20% (1609) of substances have 2 studies and ~16% (1321) of substances have 3 studies. For clastogen studies, ~36% (1695) of substances have a single study, 31% (1468) have 2 studies and ~10% (457) have 3 studies.
Figure 3.

The first 50 records from the experimental genotoxicity dataset aggregated by substance to illustrate the number of studies by type.
Figure 4.

Histograms of the number of studies per chemical per aggregate study type. The experimental genotoxicity data was aggregated by substance and aggregate study type to determine how many studies had been conducted per substance of a certain type.
One substance dominated in terms of number of studies; butylated hydroxytoluene [CASRN 128–37-0] (DTXSID2020216) was associated with 156 studies, of which 30 were Ames studies, 19 were clastogenicity studies, 1 was a gene-mutation study and 106 were categorized as ‘other’.
3.2. Genotoxicity workflow
The workflow was applied to two datasets: 1) the ‘TSCA Active List’, and 2) the 238 proof-of-concept chemicals (POC). Of the 33,364 substances on the ‘TSCA Active List’, overall calls on the basis of both experimental and in silico data could be made for 20,371 substances. Here, 46.67%% (9507) of substances were non-genotoxic (score 1), 0.5% (103) were scored as inconclusive (score 2), 43.93% (8949) were predicted genotoxic (score 3) and 8.9% (1812) were genotoxic (score 4) (see Figure 5). Overall calls for genotoxicity could be made for 225 of the 238 POC chemicals. Of these, 40.44% (91) were non-genotoxic (score 1), 2.67% (6) were inconclusive (score 2), 6.22% (14) were predicted genotoxic (score 3), and 50.67% (114) genotoxic (score 4) (Figure 6). Figures 5 and 6 show the count plots of the distribution of the genotoxicity calls for the ‘TSCA Active List’ substances and the POC chemicals based on their numeric values (1–4). Overall, negative results were obtained for ~40–47% of the substances, and ~51–57% were positives in the two sets. The proportion of substances that were scored negative or positive in the POC set was comparable to that in the broader ‘TSCA Active List’, suggesting that the POC set was representative for the larger inventory. This in turn would facilitate identification of candidate high priority and low priority substances for further evaluation as part of amended TSCA for these endpoints.
Figure 5.

Count plot showing how the 20,371 ‘TSCA Active List’ substances for which an overall call could be made were assigned to each of the genotoxicity scores based on the criteria provided in Table 2. In the plot, score 1 indicates absence of genotoxicity in experimental studies if available or predictions; score 2 indicates inconsistent experimental results; score 3 are predicted genotoxicity outcomes and score 4 are positive genotoxic outcomes in experimental studies.
Figure 6.

Count plot showing thow the 225 POC substances for which an overall call could be made were assigned to each of the genotoxicity scores based on the criteria provided in Table 2. In the plot, score 1 indicates absence of genotoxicity in experimental studies if available or predictions; score 2 indicates inconsistent experimental results; score 3 are predicted genotoxicity outcomes and score 4 are positive genotoxic outcomes in experimental studies.
As a specific example, to demonstrate interpretability of the information, consider 2-nitrochlorobenzene (DTXSID0020280), which is included in the TSCA 238 POC set of chemicals. This chemical had 88 individual records associated with it, both experimental (63 records) and in silico (25 records). There were 49 Ames and 14 clastogenicity experimental results. In addition, predictions for Ames and clastogenicity were made using the different tools culminating in 5 Ames in silico outcomes and 2 clastogenicity in silico outcomes. The overall outcome generated by the workflow was a value of 4, indicating evidence of experimental genotoxicity. This is supported by the 30 Ames positive results (19 of the 49 Ames studies were negative). Evidence of clastogenicity was less pronounced, with many equivocal or negative outcomes in chromosome aberration studies (7 negative, 4 equivocal and 3 positive results). In silico outcomes corresponded with the experimental results in terms of predicting an Ames positive (4 out of the 5 models predicted positive results), and 1 out of the 2 clastogenicity models predicted a positive outcome.
The performance of the in silico predictions was compared to the overall scores made on the basis of only experimental data for the ‘TSCA Active List’ set. The set of overall calls for the ‘TSCA Active List’ set were filtered to retain only outcomes from experimental results. Overall calls made from experimental data were identified for 6194 unique substances. The tiered workflow was then adapted to only consider predicted data and re-applied to the in silico dataset created in section 2.1.6. The outcomes from applying this adapted workflow were merged with the outcomes based on experimental data. For the ‘TSCA Active List’ set, predicted calls could be made for 5230 of the 6194 substances that had experimental outcomes. Of the 5230 substances, 3636 were categorized as true negatives and 1594 as true positives. A confusion matrix was constructed to compute summary performance metrics. The balanced accuracy of the in silico predicted genotoxicity categorizations was calculated as 57.53%, with asensitivity of 76.72% and specificity of 38.33%. Given the fair balanced accuracy of the in silico predictions, a more in depth assessment was undertaken to evaluate the performance of both additional in silico models as well as their combinations. A Naïve Bayes consensus model was ultimately developed using combinations of QSAR models and structural alert predictions. The ‘best’ consensus model was found to have a balanced accuracy of 81.2%, a sensitivity of 87.24% and a specificity of 75.20%. This is described in depth in the companion manuscript [41].
4. Conclusions
The tiered workflow described permitted a semi-automated assessment of a large number of chemicals based on relevant genotoxicity and carcinogenicity data to help inform potential candidates for prioritization under amended TSCA. Large databases that could be queried in various ways were not available in 1976, and decades of research since have identified the most relevant assays to use for the assessment of genotoxicity. As indicated by an earlier analysis [40], >13,000 organic chemicals out of 65,725 in commercial use might reasonably be expected to be mutagenic. Based on the studies cited earlier, a subset of these mutagens would be anticipated to be rodent carcinogens, and a subset of these would be anticipated to be human carcinogens. Application of the tiered approach described here may provide information helpful in the process of identifying potential candidate chemicals for prioritization under amended TSCA.
Acknowledgments
We thank Drs. John Cowden and Amar V. Singh (U.S. EPA) for their role in addressing the provenance of data and facilitating the QC of the data for the proof-of-concept chemicals. We also thank Katie Paul Friedman and Tony Williams for their helpful comments on the manuscript. This work was funded by the intramural research program of the Office of Research and Development of the U.S. Environmental Protection Agency. This project was supported in part by an appointment to the Research Participation Program at the Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and EPA.
Footnotes
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Disclaimer: This article was reviewed by the Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. EPA, and approved for publication. Approval does not signify that the contents reflect the views of the agency nor does mention of trade names of commercial products constitute endorsement or recommendation for use.
References
- [1].TSCA (1976) Toxic Substances Control Act of 1976. https://www.gpo.gov/fdsys/pkg/STATUTE-90/pdf/STATUTE-90-Pg2003.pdf [Accessed 23 June 2021].
 - [2].Markell D. 2010. An overview of TSCA, its history and key underlying assumptions, and its place in environmental regulation. Wash Univ J Law & Policy 32:333–375. https://openscholarship.wustl.edu/law_journal_law_policy/vol32/iss1/11 [Accessed 23 June 2021]. [Google Scholar]
 - [3].Waters MD, Stack HF, Brady AL, Lohman PHM, Haroun L, Vainio H. 1988. Use of computerized data listing and activity profiles of genetic and related effects in the review of 195 compounds. Mutat Res 205:295–312. [DOI] [PubMed] [Google Scholar]
 - [4].Waters MD. 1994. Development and impact of the Gene-Tox Program, Genetic Activity Profiles, and their computerized data bases. Environ Mol Mutagen 23 (Suppl 24):67–72. [DOI] [PubMed] [Google Scholar]
 - [5].LCSA. 2016. The Frank R. Lautenberg Chemical Safety for the 21st Century Act. https://www.epa.gov/assessing-and-managing-chemicals-under-tsca/frank-r-lautenberg-chemical-safety-21st-century-act [Accessed 23 June 2021].
 - [6].Public Law 114–812; 15 USC 2601. 2021. Toxic Substances Control: Findings, policy, and intent. https://uscode.house.gov/view.xhtml?req=(title:15%20section:2601%20edition:prelim) [Accessed 23 June 2021].
 - [7].U.S. EPA. A Working Approach for Identifying Potential Candidate Chemials for Prioritization. 2018. [Accessed 28 August 2021] https://www.epa.gov/sites/default/files/2018-09/documents/preprioritization_white_paper_9272018.pdf
 - [8].U.S. EPA. A Proof-of-Concept Case Study Integrating Publicly Available Information to Screen Candidates for Chemical Prioritization under TSCA. U.S. Environmental Protection Agency, Washington, DC, EPA/600/R-21–106, 2021. 10.23645/epacomptox.14878125 [DOI] [PubMed] [Google Scholar]
 - [9].Williams AJ, Grulke CM, Edwards J, McEachran AM, Mansouri K, Baker NC, Patlewicz G, Shah I, Wambaugh JF, Judson RS, Richard AM. 2017. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform. 9:61. doi: 10.1186/s13321-017-0247-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - [10].U.S. EPA. TSCA Work Plan for Chemical Assessments. 2014 Update. 2014. [Accessed 28 August 2021]. https://www.epa.gov/sites/default/files/2015-01/documents/tsca_work_plan_chemicals_2014_update-final.pdf
 - [11].IARC. 2020. International Agency for Research on Cancer. http://www.iarc.fr/ [Accessed 26 June 2021].
 - [12].Krewski D, Al-Zoughool M, Bird M, Birkett N, Billard M, Milton B, Rice JM, Cogliano VJ, Hill MA, Little J, Zielinski JM. 2019a. Analysis of key characteristics of human carcinogens. In: Tumour Site Concordance and Mechanisms of Carcinogenesis. Baan RA, Stewart BW, Straif K (Eds), IARC Sci Pub No 165, WHO Press, Lyon, France: pp 257–282. [Google Scholar]
 - [13].Krewski D, Rice JM, Bird M, Milton B, Collins B, Lajoie P, Billard M, Grosse Y, Cogliano VJ, Caldwell JC, Rusyn II, Portier CJ, Melnick RL, Little J, Zielinski JM. 2019b. Analysis of tumour site concordance. In: Tumour Site Concordance and Mechanisms of Carcinogenesis. Baan RA, Stewart BW, Straif K (Eds), IARC Sci Pub No 165, WHO Press, Lyon, France, pp 211–255. [Google Scholar]
 - [14].Huff J. 1999. Animal and human carcinogens. Environ Health Perspect 107:A341–A342. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - [15].Tennant RW and Spalding J. 1996. Predictions for the outcome of rodent carcinogenicity bioassays: identification of trans-species carcinogens and noncarcinogens. Environ Health Perspect 104 (Suppl 5):1095–1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - [16].IRIS. 2020. Integrated Risk Information System. https://www.epa.gov/iris [Accessed 23 June 2021].
 - [17].OPP. 2020. Office of Pesticide Programs of the U.S. EPA. https://www.epa.gov/pesticides [Accessed 26 June 2021].
 - [18].PPRTV. 2020. Provision Peer Reviewed Toxicity Values for Superfund of the U.S. EPA. https://hhpprtv.ornl.gov/ [Accessed 26 June 2021].
 - [19].CalEPA. 2020. California Environmental Protection Agency. https://calepa.ca.gov/ [Accessed 26 June 2021].
 - [20].RoC. 2020. Report on Carcinogens of the National Toxicology Program. https://ntp.niehs.nih.gov/pubhealth/roc/index-1.html [Accessed 26 June 2021].
 - [21].Health Canada. 2013. Federal Contaminated Site Risk Assessment. http://publications.gc.ca/site/eng/387683/publication.html [Accessed 26 June 2021].
 - [22].NIOSH. 2012. National Institute for Occupational Safety and Health Occupational Cancer. https://www.cdc.gov/niosh/topics/cancer/npotocca.html [Accessed 26 June 2021].
 - [23].NTP (National Toxicology Program). 2019. Genetic Toxicology. https://ntp.niehs.nih.gov/whatwestudy/testpgm/genetic/index.html [Accessed 26 June 2021].
 - [24].U.S. EPA. 2019. OncoLogic™ - A Computer System to Evaluate the Carcinogenic Potential of Chemicals. https://www.epa.gov/tsca-screening-tools/oncologictm-computer-system-evaluate-carcinogenic-potential-chemicals [Accessed 26 June 2021].
 - [25].DeMarini DM. 2019. Role of genotoxicity in carcinogenesis. In: Tumor Site Concordance and Mechanisms of Carcinogenesis. Baan RA, Stewart BW, Straif K (Eds), IARC Sci Pub No 165, WHO Press, Lyon, France, pp 107–115. [Google Scholar]
 - [26].Smith MT, Guyton KZ, Gibbons CF, Fritz JM, Portier CJ, Rusyn I, DeMarini DM, Caldwell JC, Kavlock RJ, Lambert PF, Hecht SS, Bucher JR, Stewart BW, Baan RA, Cogliano VJ and Straif K. 2016. Key characteristics of carcinogens as a basis for organizing data on mechanisms of carcinogenesis. Environ Health Perspect 124:713–721. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - [27].Kirkland D, Aardema M, Henderson L, Muller l. 2005. Evaluation of the ability of a battery of three in vitro genotoxicity tests to discriminate rodent carcinogens and non-carcinogens: I. Sensitivity, specificity and relative predictivity. Mutat Res 584:1–256. [DOI] [PubMed] [Google Scholar]
 - [28].Mayer J, Cheeseman M, Twaroski M. 2008. Structure-activity relationship analysis tools: Validation and applicability in predicting carcinogens. Regul Toxicol Pharmcol 50:50–58. [DOI] [PubMed] [Google Scholar]
 - [29].McCann J, Choi E, Yamasaki E, Ames BN. 1975. Detection of carcinogens as mutagens in the Salmonella/microsome test. Assay of 300 chemicals. Proc Natl Acad Sci USA 72:5135–5139. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - [30].Morita T, Asano N, Awogi T, Sasaki YF, Sao S-I, Shimada H, Sofuni T. 1997. Evaluation of the rodent micronucleus assay in the screening of IARC carcinogens (groups 1, 2A and 2B): The summary report of the 6th collaborative study by CSGMT/JEMS-MMS. Mutat Res 389:3–122. [DOI] [PubMed] [Google Scholar]
 - [31].Tennant RW, Margolin BH, Shelby MD, Zeiger E, Haseman JK, Spalding J, Caspary W, Resnick M, Stasiewicz S, Anderson B, Minor R. 1987. Prediction of chemical carcinogenicity in rodents from in vitro genetic toxicity assays. Science 236:933–941. [DOI] [PubMed] [Google Scholar]
 - [32].Zeiger E. 1987. Carcinogenicity of mutagens: predictive capability of the Salmonella mutagenesis assay for rodent carcinogenicity. Cancer Res 47:1287–1296. [PubMed] [Google Scholar]
 - [33].Zeiger E. 1998. Identification of rodent carcinogens and noncarcinogens using genetic toxicity tests: premises, promises, and performance. Reg Toxicol Pharmacol 28:85–95. [DOI] [PubMed] [Google Scholar]
 - [34].Hernández LG, van Steeg H, Luijten M, van Benthem J. 2009. Mechanisms of non-genotoxic carcinogens and importance of a weight-of-evidence approach. Mutat Res 682:94–109. [DOI] [PubMed] [Google Scholar]
 - [35].Stewart BW. 2019. Mechanisms of carcinogenesis: from initiator and promotion to the hallmarks. In: Tumor Site Concordance and Mechanisms of Carcinogenesis. Baan RA, Stewart BW, Straif K (Eds), IARC Sci Pub No 165, WHO Press, Lyon, France, pp 93–106. [Google Scholar]
 - [36].Eastmond DA, Hartwig A, Anderson D, Anwar WA, Cimino MC, Dobrev I, Douglas GR, Nohmi T, Phillips DH, Vickers C. 2009. Mutagenicity testing for chemical risk assessment: update of the WHO/IPCS Harmonized Scheme. Mutagenesis 24:341–349. [DOI] [PubMed] [Google Scholar]
 - [37].OECD. 2015. Guidance Document on Revisions to OECD Genetic Toxicology Test Guidelines. https://www.oecd.org/chemicalsafety/testing/Genetic%20Toxicology%20Guidance%20Document%20Aug%2031%202015.pdf [Accessed 26 June 2021].
 - [38].ICH. 2012. International Conference on Harmonisation; Guidance on S2(R1) Genotoxicity Testing and Data Interpretation for Pharmaceuticals Intended for Human Use; Availability. https://www.federalregister.gov/documents/2012/06/07/2012-13774/international-conference-on-harmonisation-guidance-on-s2r1-genotoxicity-testing-and-data [Accessed 26 June 2021]. [PubMed]
 - [39].Williams R, DeMarini DM, Stankowski LF Jr, Escobar PA, Zeiger E, Howe J, Elespuru RK, Cross KP. 2019. Are all bacterial strains required by OECD mutagenicity test guidelines TG471 needed? Mutat Res 848:503081. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - [40].Zeiger E, Margolin BH. 2000. The proportions of mutagens among chemicals in commerce. Reg Toxicol Pharmacol 32:219–225. [DOI] [PubMed] [Google Scholar]
 - [41].Pradeep P, Judson R, DeMarini D, Keshava N, Martin TM, Dean J, Gibbons CF, Simha A, Warren SH, Gwinn MR, Patlewicz G. 2021. Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset. Computational Toxicology, 18, 100167. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - [42].Grulke CM, Williams AJ, Thillanadarajah I, Richard AM. 2019. EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research. Computational Toxicology 12: 100096. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - [43].OECD. 2016. Test No. 476: In Vitro Mammalian Cell Gene Mutation Tests using the Hprt and xprt genes OECD Guidelines for the Testing of Chemicals, Section 4. OECD Publishing, Paris, 10.1787/9789264264809-en. [DOI] [Google Scholar]
 - [44].Patlewicz G Fitzpatrick JM. 2016. Current and future perspectives on the development, evaluation, and application of in silico approaches for predicting toxicity. Chem Res Toxicol 29:438–451. [DOI] [PubMed] [Google Scholar]
 - [45].Bakhtyari NG, Raitano G, Benfenati E, Martin TM, Young DM. 2013. Comparison of in silico models for prediction of mutagenicity. J Environ Sci Health Part C 31:45–66. [DOI] [PubMed] [Google Scholar]
 - [46].Hansen K, Mika S, Schroeter T, Sutter A, ter Laak A, Seger-Hartmann T, Heinrich N, Müller. 2009. Benchmark data set for in silico prediction of Ames mutagenicity. J Chem Inf Model 49:2077–2081. [DOI] [PubMed] [Google Scholar]
 - [47].Hasselgren C, Ahlberg E, Akahori Y, Amberg A, Anger LT, Atienzar F, Auerbach S, Beilke L, Bellion P, Benigni R, Bercu J, Booth ED, Bower D, Brigo A, Cammerer Z, Cronin MTD, Crooks I, Cross KP, Custer L, Dobo K, Doktorova T, Faulkner D, Ford KA, Fortin MC, Frericks M, Gad-McDonald SE, Gellatly N, Gerets H, Gervais V, Glowienke S, Van Gompel J, Harvey JS, Hillegass J, Honma M, Hsieh JH, Hsu CW, Barton-Maclaren TS, Johnson C, Jolly R, Jones D, Kemper R, Kenyon MO, Kruhlak NL, Kulkarni SA, Kümmerer K, Leavitt P, Masten S, Miller S, Moudgal C, Muster W, Paulino A, Lo Piparo E, Powley M, Quigley DP, Reddy MV, Richarz AN, Schilter B, Snyder RD, Stavitskaya L, Stidl R, Szabo DT, Teasdale A, Tice RR, Trejo-Martin A, Vuorinen A, Wall BA, Watts P, White AT, Wichard J, Witt KL, Woolley A, Woolley D, Zwickl C, Myatt GJ. 2019. Genetic toxicology in silico protocol. Regul Toxicol Pharmacol 107:104403. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - [48].Schultz TW, Diderich R, Kuseva CD, Mekenyan OG. 2018. The OECD QSAR Toolbox starts its second decade. Methods Mol Biol 1800:55–77. [DOI] [PubMed] [Google Scholar]
 - [49].TEST. 2021. Toxicity Estimation Software Tool (TEST). https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test [Accessed 26 June 2021].
 - [50].OECD Toolbox. 2020. The OECD QSAR Toolbox. https://www.oecd.org/chemicalsafety/risk-assessment/oecd-qsar-toolbox.htm [Accessed 26 June 2021].
 
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The software code for the data analysis was written in Python 3.7. The code is available on github at https://github.com/g-patlewicz/genetox-tsca and the supplementary data files are available on the EPA FTP site under https://gaftp.epa.gov/Comptox/CCTE_Publication_Data/CCED_Publication_Data/PatlewiczGrace/CompTox-tsca.
