Skip to main content
EPA Author Manuscripts logoLink to EPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Feb 1.
Published in final edited form as: Comput Toxicol. 2023 Feb 1;25:10.1016/j.comtox.2022.100256. doi: 10.1016/j.comtox.2022.100256

Development of a CSRML version of the Analog Identification Methodology (AIM) fragments and their evaluation within the Generalised Read-Across (GenRA) approach

Matthew Adams 1,2, Hannah Hidle 1,2, Daniel Chang 2, Ann M Richard 2, Antony J Williams 2, Imran Shah 2, Grace Patlewicz 2,*
PMCID: PMC9888031  NIHMSID: NIHMS1862606  PMID: 36733411

Abstract

The Analog Identification Methodology (AIM) was developed over 20 years ago to identify analogues to support read-across at the US Environmental Protection Agency. However, the current public version of the standalone tool, released in 2012, is no longer usable on Windows operating systems supported by Microsoft. Additionally, the structural logic for analogue selection is based on older, customised Simplified molecular-input-line-entry system (SMILES)-type features that are incompatible with modern cheminformatics tools. Given these limitations, a case study was undertaken to explore a more transparent, extensible method of implementing the AIM fragments using Chemical Subgraphs and Reactions Mark-up Language (CSRML). A CSRML file was developed to codify the original AIM fragments, and the extent to which AIM fragments were faithfully replicated was assessed using the AIM Database. The overall mean performance of the CSRML-AIM across all fragments in terms of sensitivity, specificity, and Jaccard similarity was 89.5%, 99.9%, and 82.2%, respectively. Comparing the AIM fragments with public ToxPrints using a large set of ~25,000 substances of regulatory interest to EPA found them to be dissimilar, with an average maximum Jaccard score of 0.24 for AIM and 0.29 for ToxPrint fingerprints. Both fragment sets were then used as inputs in the automated read-across approach, Generalised Read-Across (GenRA), to evaluate the quality of fit in predicting rat acute oral toxicity LD50 values with the coefficient of determination (R2) and root mean squared error (RMSE). The performance of AIM fragments was R2=0.434 and RMSE=0.663 whereas that of ToxPrints was R2=0.477 and RMSE=0.638. A bootstrap resampling using 100 iterations found the mean and the 95th confidence interval of R2 to be 0.349 [0.319, 0.379] for AIM fragments and 0.377 [0.338, 0.412] for ToxPrints. Although AIM and ToxPrints performed similarly in predicting LD50, they differed in their performance at a local level, revealing that their features can offer complementary insights.

Keywords: Analog Identification Methodology (AIM), ToxPrints, CSRML, Generalised Read-Across (GenRA), acute toxicity LD50, ClassyFire

1.0. Introduction

Read-across is a common data gap filling technique used within analogue and category approaches. In a recent publication [1], several publicly available tools for read-across were described, including the Analog Identification Methodology (AIM) tool (https://www.epa.gov/tsca-screening-tools/analog-identification-methodology-aim-tool) relative to a generic read-across workflow. AIM was originally developed over 20 years ago by Syracuse Research Corporation (SRC, Syracuse, NY) under contract by the EPA Office of Pollution Prevention and Toxics (OPPT), Risk Assessment Division. The goal of AIM was to facilitate the identification of candidate structural analogues that could be used to support read-across for new chemical evaluations submitted under the Toxics Substance Control Act (TSCA) Premanufacture Notification (PMN) review process (https://www.epa.gov/reviewing-new-chemicals-under-toxic-substances-control-act-tsca). The TSCA Modernisation Act, signed into law in 2016 [2], endorsed the use of read-across for assessing the hazards for data-poor chemicals, which has prompted renewed interest in the development and application of AIM. As stated in the AIM user manual, AIM profiles any input chemical ‘using a feature set consisting of over 700 atoms, groups, and super fragments that are indexed in a predefined database’. It then matches the input chemical to the corresponding profiles of potential analogues from a built-in inventory of over 86,000 substances, the latter compiled from publicly available sources of empirical property and/or toxicity data. The standalone version of AIM, publicly released in 2012, is no longer usable on Windows operating systems supported by Microsoft, which presents significant technical limitations in its ongoing use by the scientific community. In addition, the program code is unavailable for modification, and the Simplified molecular-input-line-entry system (SMILES)-like language used to encode the original AIM fragments is incompatible with modern SMILES implementations in publicly available cheminformatics tools. Unlike the AIM fragments, SMILES and extensions such as SMARTS (SMILES arbitrary target specification) afford some level of standardisation in how chemical information can be interpreted by different cheminformatics tools.

Concurrently, our research has focused on developing an algorithmic, data-driven, read-across approach, called Generalised Read-Across (GenRA) [3,4] that uses different chemical and/or bioactivity fingerprints to predict in vivo toxicity endpoints. Given the technical shortcomings of AIM, and the potential for using these chemical features to facilitate analogue identification within GenRA, this study explored a means of codifying the AIM fragments into a machine-readable format. Further, a comparison was made with another fingerprint approach, ToxPrints [5], following an initial hypothesis that the AIM fragments and ToxPrints might be similar, given the similar number and types of features as well as their environmental toxicity landscape focus. Lastly, a systematic analysis was performed to evaluate the extent to which AIM fragments were useful in predicting acute oral rodent lethality values using GenRA and how their performance compared with ToxPrints and Morgan fingerprints, the latter of which had been investigated previously in Helman et al. [6].

This study had 3 main objectives:

  1. Codify the AIM fragments in a format that would allow for easy refinement and extension and perform a benchmark comparison of the implementation relative to the original AIM database.

  2. Compare and contrast AIM fragments with ToxPrint fragments using a large set of substances of regulatory interest to EPA.

  3. Investigate the utility of the AIM fragments in the automated read-across approach, GenRA, for predicting acute oral rat toxicity lethality (LD50 values).

Regarding the first objective, the Chemical Subgraphs and Reactions Mark-up Language (CSRML), developed in association with the public ChemoTyper and ToxPrints by Altamira and Molecular Networks GmbH [MN-AM, Nürnberg, Germany] under contract with the U.S. Food and Drug Administration, was used to codify the AIM fragments. CSRML is an XML-based query language that was developed to represent chemical substructures, reaction rules, and reactions as so-called chemotypes or ToxPrints. ToxPrints were devised to integrate atom, bond, or charge information in addition to topology and connectivity information that can be captured by, for example, SMARTS, or SMILES patterns. The CSRML language was developed in parallel with the ToxPrints, a public set of fingerprints that provide coverage of the environmental, regulatory, and commercial-use chemical space [5]. A publicly available ChemoTyper tool Version 1.0 Revision 12976 (chemotyper.org) had been developed to enable ToxPrints (toxprint_V2.0_r711.xml) to be visualised and superimposed on chemical structures and used for querying and fingerprinting chemical structures. Thus, developing a CSRML file to encode AIM fragments would facilitate exchange and refinement using the ChemoTyper tool.

2.0. Materials and Methods

2.1. Codifying AIM fragments into CSRML

Codifying the AIM fragments relied on two sources of information – the ChemACE Manual [https://www.epa.gov/sites/default/files/2015-09/documents/chemace_user_manual.pdf] and the AIM database (AIMDB). The ChemACE manual included an Appendix table depicting drawings of the generalised structural representations of 787 AIM chemical fragments together with a unique ID. Included with the installation folders for the standalone AIM tool [https://www.epa.gov/tsca-screening-tools/analog-identification-methodology-aim-tool] was a database of 86,200 chemical structures (83,277 unique structures based on their SMILES), their SMILES, and Chemical Abstracts Service (CAS) Registry Numbers (CASRN), as well as a list of detected AIM fragments tagged by ID. A structure-data file (sdf) was created for the substances in the AIMDB using the freely available Python package, RDKit [7].

The workflow for converting the AIM fragments into CSRML format was an iterative process (see Figure 1). Each AIM fragment depicted in the Appendix was manually converted into a machine-readable SMARTS format (https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html). SMARTS is a language that allows substructures to be specified as rules for querying purposes. The Chemotype Editor (chemotyper.org) tool was then used to convert each SMARTS to its corresponding chemotype to create an initial AIM CSRML. The initial AIM CSRML and sdf of the AIMDB were then matched within the ChemoTyper tool to generate a binary chemical fingerprint file that could be exported. This exported fingerprint file comprised a binary matrix of AIM fragments for the entire database of AIM chemicals, where an entry of “1” denoted the presence of a fragment within a substance and “0” denoted the absence of a fragment. A list of all AIM fragments matched by the AIM CSRML for a given substance was then extracted from the fingerprint file to be cross-referenced against the corresponding list of AIM fragment IDs that had been annotated in the AIM database. Substances with mismatching AIM fragments were manually reviewed, and the AIM CSRML was revised accordingly.

Figure 1.

Figure 1.

Workflow for converting AIM fragments to CSRML.

Per Figure 1: a) SMARTS patterns were created to represent all AIM fragment images as depicted in the original Appendix in the ChemACE Manual; b) SMARTS patterns were then converted into CSRML format using the ChemoType Editor tool (https://chemotyper.org/); c) a sdf of the AIM database was matched against the CSRML using the ChemoTyper tool to generate a fingerprint file; d) a list of fragments was extracted for each chemical specific fingerprint which were then matched against the list of AIM fragments reported in the AIM database; e) mismatches were identified and each incorrect fragment was manually reviewed to identify what refinements were needed in the SMARTS pattern initially created to capture the chemical concept depicted by the AIM fragment image. The refinement loop continued until the mean performance metrics (specifically the Jaccard similarity) reached a threshold greater than 80%.

2.2. Evaluating AIM CSRML

The correspondence between the AIM-CSRML and the AIM database IDs initially relied on the first 5000 entries in the AIM database. Subsequent subsets were enriched by the mismatches identified that needed to be resolved. A confusion matrix for each AIM fragment was derived to evaluate performance metrics using accuracy, precision, sensitivity, specificity, and Jaccard similarity. Accuracy is defined as:

TP+TNTP+FN+FP+TN

where TP = number of True Positives, TN = number of True Negatives, FN = number of False Negatives and FP = number of False Positives.

Precision=TP(TP+FP)
Sensitivity=TP(TP+TN)
Specificity=TN(FP+TN)

and Jaccard similarity is defined as TP(TP+FN+FP)

The confusion matrix fragment ID 68C is presented in Table 1. The starting point to create the confusion matrices for each fragment relied on an initial dataframe which summarised all the substances evaluated, together with three further columns containing the list of AIM fragments for those substances, the corresponding list of AIM-CSRML fragments identified and the list of fragments that differed between the 2 fragment sets (created in step d) of the workflow depicted in Figure 1). For a given fragment ID, the initial dataframe was first filtered to return only those substances that contained that fragment amongst the lists of AIM fragments. The number of True Positives (TP) was determined by filtering this subset of substances by the presence of that fragment amongst the AIM-CSRML list. In this case, fragment ID 68C appeared in 140 substances on the basis of the AIM fragments, but only 113 of the AIM-CSRML fragments list; thus the TP was calculated to be 113. To determine the False Positives (FP), the initial dataframe was instead filtered to return those substances where the fragment of interest showed up in the ‘difference in the list of fragments’ column. For fragment ID 68C, there were 31 substances where 68C showed up in the list of differences and of those, 4 were present in only the AIM-CSRML fragment lists, i.e., the FP was 4 and in turn the FN was 27. The True Negatives (TN) were determined by substracting (TP+FN+FP) from the total number of substances in the initial dataframe, i.e. 73179-(113+27+4) = 73035 (see code Notebook 01 for further details for how this initial dataframe was created and how these confusion matrices were computed across all fragment IDs)

Table 1:

Confusion matrix for an AIM fragment, illustrated for fragment ID 68C

graphic file with name nihms-1862606-t0001.jpg

The mean performances of these metrics are reported in the Results section 4.2.

Refinements to the AIM CSRML model relied upon manual revisions of the CSRML to resolve the mismatches. This iterative process continued until the mean performance reached a minimum threshold (Jaccard score of 80% or greater) as denoted in Figure 1. Additional performance metrics were computed to characterise the entire AIM database, including the average number of fragments per substance, the distribution of substances per fragment and the mean number of mismatches per substance. Ultimately, the goal was to create a set of CSRML features that not only replicated the fingerprint patterns created by the corresponding AIM fragments, but that captured the same or similar chemical concepts depicted in the original AIM fragment drawings in the Appendix of the ChemACE manual. The final AIM CSRML file is, henceforth, referred to as AIM-CSRML.

2.3. Comparing the AIM fingerprints to ToxPrints

AIM-CSRML fingerprints were compared to the public ToxPrints using an extensive list of substances of interest to EPA. The expectation was that the AIM fragments and ToxPrints might be closely aligned since both had been nominally developed to provide environmental, regulatory, and commercial-use chemical space coverage and both were similar in the sorts of fragment descriptions and concepts captured based on what was illustrated in the AIM ACE manual and how the ToxPrints were represented within the ChemoTyper tool. The batch download functionality within the EPA CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard/; [8]) was used to download a large set of substances of regulatory interest to EPA. The following lists were downloaded from the Dashboard: Integrated Risk Information System (IRIS), Provisional Peer-Reviewed Toxicity Values (PPRTVWEB), Toxic Substances Control Act (TSCA), Office of Pesticide Programs Information Network (OPPIN), Toxics Release Inventory (TRIRELEASE), EPA’s Toxicity Forecaster (ToxCast) (TOXCAST), and the Tox21 Screening Library (TOX21SL). Duplicates were removed to result in a final list of 26,099 unique substances. The Tox21SL [9] and ToxCast [10] lists represent all the substances that have undergone screening in various high-throughput screening assays as part of the Tox21 programme or by EPA as part of ToxCast. The IRIS and PPRTV lists represent substances with authoritative toxicity reference values or preliminary peer reviewed toxicity values following assessments. OPPIN represents substances under the purview of the Office of Pesticide Programs, whereas TRI substances are chemicals that pose significant adverse health effects. The TSCA set represents the non-confidential portion of the active TSCA inventory that has been curated and mapped to other content surfaced on the Dashboard.

The Chemotyper tool was used to generate an AIM-CSRML and ToxPrint binary chemical fingerprint file for the set of 26,099 substances. The distribution of fragments across the substances in the dataset was plotted to provide an initial perspective of overall coverage using each fingerprint set. The maximum pairwise Jaccard similarity was then computed between the two fingerprint sets to identify which ToxPrint and AIM features were most similarly distributed across the dataset to help diagnose the commonalities and differences between the two fingerprint sets. The following steps were performed: 1) A comparison matrix was created comprising all 783 AIM-CSRML fragments as a row index and all 729 ToxPrints as a column index. The cells were populated by confusion matrices reflecting the counts of substances containing one or another fragment. 2) A comparison was made across all ToxPrint fragments for a given AIM-CSRML fragment to identify which had the highest Jaccard similarity. 3) The converse comparison was then performed to identify the AIM-CSRML fragment with the highest Jaccard similarity for a given ToxPrint. This indirect means of comparing the fragments was undertaken because, although the ToxPrints can be browsed as structural depictions using the ChemoTyper tool with the associated CSRML file, their exact structural patterns (e.g., as SMARTs or similar) are not publicly accessible.

2.4. Comparing AIM-CSRML and ToxPrints using a GenRA approach

Read-across is a commonly used data gap filling technique whereby endpoint information for one substance (the source analogue substance) is used to predict the same endpoint for another substance (the target substance) that shares a ‘similar’ characteristic, typically based on structural similarity. Although structural similarity is often the starting point to identify candidate source analogues, many other considerations come into play in terms evaluating their suitability for read-across for hazard and risk assessment purposes. The mechanistic, toxicological, reactivity, metabolic similarity of the candidate analogues also need to be assessed. Substances that are structurally related may metabolise by very divergent pathways leading to the formation of reactive metabolites as a result of differing functional groups or alkyl groups that can alter the reactivity profile and in turn the toxicity profile. e.g. hexane and pentane are part of the same homologous series, but 2,5-hexanedione, a metabolite of hexane is implicated in the neurotoxicity observed for hexane. Equally structurally similar analogues may exhibit different profiles in terms of their biological (mechanistic) similarity e.g. differences in which receptors are activated in high throughput assays may also translate into differences in toxicological similarity. Here, the Generalised Read-Across (GenRA) approach [3,4], as implemented in the Python package genra-py [11], was used to make rat acute oral LD50 predictions using both AIM-CSRML and ToxPrint fingerprints. The acute oral rat toxicity dataset used was the same one evaluated in Helman et al [6] using Morgan circular fingerprints [12]. This dataset was used since it is reasonably large, with data available for 1000s of chemicals to enable a systematic baseline performance evaluation. Overall ‘global’ performance characteristics (R2, RMSE) using AIM-CSRML and ToxPrints were derived and compared with those reported in Helman et al [6]. ‘Local’ performance was summarised on the basis of groups of substances, where groups (categories) were defined using the ClassyFire automated chemical classification scheme [13]. This scheme permitted performance to be summarised across categories of substances sharing the same chemical taxonomy to help uncover the types of substances that were particularly well or poorly predicted using AIM-CSRML or ToxPrint fingerprints. ClassyFire uses chemical structures and structural features to automatically assign chemicals to a taxonomy comprising >4800 different categories. The taxonomy consists of up to 11 different levels (Kingdom, SuperClass, Class, Subclass, etc.) with each category defined by structural rules. The ClassyFire webserver, available at http://classyfire.wishartlab.com/, was used to process the entire set of substances in the LD50 dataset using DTXSID and InChI keys (IUPAC International Chemical Identifier) as identifiers to assign Kingdom, Superclass, Class, Subclass and Substituent information. Class and/or Subclass were found to offer the best compromise between structural information vs. group membership size. For each group, the mean absolute error was calculated for the AIM-CSRML and ToxPrint fragments to identify those ClassyFire groups where the performance was much improved or far worse relative to the ‘global’ performance. Selected ClassyFire groups, which resulted in a large difference in performance between the AIM-CSRML and ToxPrint fragments, were further investigated. In addition, prototypical substances that were particularly poorly predicted were identified to better understand the basis for the poor predictions, whether that be missing fragments or lack of candidate source analogues with sufficiently high similarities.

3.0. Data analysis and processing

The ChemoTyper and ChemoType Editor tools were used to create and evaluate the AIM-CSRML and generate associated AIM-CSRML and ToxPrint fingerprint files.

Data processing was conducted using the Anaconda distribution [14] of Python 3.8 and associated libraries – RDKit [7], scikit-learn [15], Pandas [16], NumPy [17], Matplotlib visualisation tools [18], Seaborn [19] and the statistical library SciPy [20] within a Jupyter [21] lab environment. GenRA analysis was conducted using the genra-py package v1.7 [11].

The code repository and associated data files supporting this analysis are available at github.com/g-patlewicz/aim.

4.0. Results and Discussion

4.1. AIM-CSRML Fragments

There were 831 AIM fragment IDs tabulated in the original ChemACE Appendix document (see supplementary information). Six of the fragment IDs were not associated with any fragment pattern. Closer inspection found these contained duplicates; for example, fragments 64E and 334C both reference Gold [Au]. Removal of duplicates from the larger set resulted in 787 unique fragments. On the other hand, the AIM database contained 767 unique fragments based on the ID tag. In total, there were 568 fragments in common between the Appendix and AIM database.

The CSRML format uses subgraphs to represent a substructure target query such that sometimes a combination of subgraphs had to be used to fully describe a fragment. The final CSRML developed after several iterations contained 903 subgraphs covering 783 unique fragments (see Supplementary information). Some fragments had to be represented by multiple subgraphs due to CSRML’s inability to handle recursive SMARTS patterns. For example, ortho- and meta-isomers could not be captured within the same subgraph. Figure S1 shows how the first six Appendix fragments were captured as corresponding AIM-CSRML fragments.

4.2. AIM-CSRML Evaluation

The initial AIM-CSRML fragment mean performance characteristics relative to the final CSRML version after several iterations are summarised in Table 2. Precision, sensitivity, and Jaccard similarity all improved substantially following several iterations. The lack of complete and unambiguous documentation prevented a complete correspondence between the AIM-CSRML and the AIM database assignments from being reached. Furthermore, various discrepancies were uncovered between the ChemACE manual Appendix and the AIM database during the iterations of refinements that were undertaken to resolve mismatches. If there were conflicts between the Appendix and the AIM database, the AIM database annotation was relied upon as the ‘true’ representation. There were three main types of conflicts encountered:

Table 2.

Comparison of initial CSRML and final CSRML versions of the reproduced AIM fragments in terms of their mean performance metrics

Metric Mean across all fragments CSRML version AIM_V0.1 initial version pre tuning CSRML version AIM_V1.1 final version post tuning
Accuracy (TP+TN)/Total 99.2% 99.9%
Specificity (TN) / (FP+TN) 99.15% 99.9%
Sensitivity (TP) / (TP+FN) 36.1% 89.5%
Precision (TP) / (TP+FP) 56.1% 90.6%
Jaccard (TP/TP+FP+FN) 27.1% 82.2%
  1. The depicted structure in the Appendix differed from what was matched in the AIMDB or Comments provided in the Appendix that were contradictory (e.g., R group incorrectly defined)

  2. The fragment ID in the Appendix was not present in the AIMDB or vice versa

  3. Duplicates. e.g., two IDs were assigned to the same structural feature

Figure 2 depicts the mean Jaccard similarity across the fragments. After several iterations, the majority of fragments exceeded the mean Jaccard similarity of 82.2% and no further refinements of the CSRML were made. This shift in performance certainly highlights the level of ambiguity in the fragment descriptions as described in the Appendix which was initially relied upon when first encoding them into SMARTS and their corresponding CSRML formats.

Figure 2:

Figure 2:

Histogram and Cumulative Distribution Function plots for the mean Jaccard similarity across the AIM-CSRML fragments (see code Notebook 01).

Since each substance in the AIM database contains multiple fragments, it was helpful to explore the number of AIM CSRML and AIM ID fragment mismatches relative to the number of fragments in a substance to prioritise where to focus subsequent efforts in resolving mismatches. Looking simply at counts of the number of mismatches would not differentiate a substance containing 10 fragments with 1 mismatch from a substance with 10 fragments but 4 mismatches. Figure 3a shows the distribution of mismatches whereas Figure 3b shows the distribution of fragments themselves.

Figure 3:

Figure 3:

a) Histogram of the number of fragment mismatches per compound b) Histogram of the number of fragments per compound (see code Notebook 02)

4.3. Comparing AIM-CSRML and ToxPrints

The diversity coverage of the AIM-CSRML and ToxPrint fingerprints were compared using a large set of substances of regulatory interest for which the respective fingerprint files were generated. Figures 4a and b show the distribution of AIM-CSRML and ToxPrint fragments across the set of more than 26,000 substances. The AIM-CSRML fragments appear “less generalised” in the sense that, on average, AIM-CSRML fragments show up in fewer structures than ToxPrints (as noted in Table 3 by the mean fragment frequency). That is to say that AIM-CSRML fragments are much more specific in terms of their inclusion and exclusion criteria which results in fewer structures presenting a fragment in contrast to ToxPrints which have a hierarchy of generalised to more specific fragments and as a result more structures are matched by a given fragment pattern.

Figure 4.

Figure 4.

Overall distributions of a) AIM-CSRML and b) ToxPrint fragments across the 26,099 substances of regulatory interest (see code Notebook 03).

Table 3.

Summary Jaccard metrics for the AIM-CSRML and ToxPrint comparisons

Metric AIM-CSRML Average ToxPrint Average
#Compounds with fragment 285 454
Mean maximum Jaccard similarity 0.24 0.29

Figures 5a and b depict the distribution of the maximum Jaccard similarity values for both fragment sets. Here, a comparison was performed across all AIM-CSRML fragments relative to a given ToxPrint to identify the most similar one using the maximum Jaccard similarity as a metric. The comparison was performed vice versa by comparing each AIM fragment across all ToxPrints to identify its closest match, i.e. based on highest Jaccard similarity. Contrary to our initial expectations, the AIM-CSRML and ToxPrints were quite different, with the majority of fragments specific to each set (Table 3). This would obviously impact the membership of neighbourhoods generated for a read-across scenario and in turn any predictions of toxicity.

Figure 5:

Figure 5:

Maximum Jaccard similarity for the AIM-CSRML vs ToxPrint fingerprint sets and vice versa (see code Notebook 03).

Table 4 highlights a handful of AIM-CSRML fragments and their corresponding top 2 most similar ToxPrints.

Table 4:

Illustrative AIM-CSRML fragments and their corresponding top 2 most similar ToxPrints.

AIM-CSRML fragment Most similar ToxPrint (Jaccard similarity)
Aromatic Carbon, fragment ID 19 ring:aromatic_benzene
(0.905)
chain:aromaticAlkane_Ph-C1_acyclic_generic
(0.513)
C(=O)O ester, aliphatic attach, fragment ID 40 bond:C(=O)O_caboxylicEster_alkyl
(0.694)
bond:C(=O)O_carboxylicEster_acyclic
(0.679)
OC(=O)N carbamate, fragment ID 61 bond:C(=O)N_carbamate
(0.902)
bond:C(=O)N_carboxamide_generic
(0.086)

4.4. GenRA predictions using AIM-CSRML and ToxPrints – ‘global’ performance

To assess the ability of the two fingerprint sets to capture relevant features pertaining to toxicity, GenRA was used to predict rat acute oral LD50 values based on AIM-CSRML and ToxPrint fingerprints. Firstly, a grid search using a range of the number of neighbours (1–15) and 2 different distance metrics (Jaccard and Euclidean) was conducted within a 5-fold cross-validation procedure. The best score (coefficient of determination (R2)) from the grid search was achieved using Jaccard as a metric for both fingerprint types although the optimal number of neighbours differed, with 8 neighbours for AIM-CSRML fragments and 6 for ToxPrint fingerprints. A leave-one-out cross-validation using 8 (or 6) neighbours for both fingerprint types was then conducted on the entire dataset. A linear regression was used to fit the predicted and observed LD50 for all chemicals. The quality of fit was evaluated using the coefficient of determination (R2) which captures the percentage of variance explained by the model and the RMSE which is the square root of the variance of the residuals. The reported R2 (and RMSE) using the AIM-CSRML and ToxPrint fingerprints are shown in Table 5, with a comparison to Morgan fingerprints that had been used in Helman et al. [6].

Table 5.

Summary performance metrics for predicting rat acute oral LD50 using AIM-CSRML and ToxPrint fragments, with Morgan fingerprint performance metrics are shown for comparison.

Fingerprint type No of chemicals No of neighbours R2 RMSE
AIM-CSRML 6968 8 0.434 0.663
ToxPrints 6972 6 0.477 0.638
Morgan (radius =3, 2048 bit length) 6975 8 0.516 0.614

Optimisation of the neighbourhood size for the ToxPrints resulted in a slightly higher performance at predicting LD50 based on the R2 and RMSE of the fitted linear regressions, than from using AIM-CSRML fragments. In the original analysis (Helman et al [6]) Morgan fingerprints fared better than using either of the fingerprints in this study. Nonetheless the results in Table 5 do provide a convenient baseline comparison of the performance that can be achieved with these 3 fingerprint types, contrasting two types of fixed fragment sets with a hashed Morgan fingerprint set.

A cross-validation was then performed using the optimal number of neighbours and 100 75:25 train-test splits of the data, recording the R2 of each iteration’s test predictions. For 100 cross-validation iterations, the median of the R2 values was 0.349 for the AIM-CSRML fragments with lower and upper 95th percentile values of 0.319 and 0.379. In contrast, the median for the ToxPrints was 0.377 with lower and upper 95th percentiles of 0.338 and 0.412. Figures 6a and b show the distribution of R2 for the train-test splits for the AIM-CSRML and ToxPrints. The resampled R2 values provide a more realistic estimate of the generalised performance for predicting LD50, with ToxPrints still faring better than AIM-CSRML fragments. That said, the performance still falls short compared with the Morgan fingerprints used in Helman et al [6] as well as other models cited within and since (see Mansouri et al. [22] where the consensus model performance reported for LD50 was a R2=0.67 and RMSE=0.47). Neither of the fragment types appear to be able to explain the variance in the LD50 well. Other aspects beyond structure such as mechanistic similarity, drawing on the insights discussed in Edwards et al. [23] offer novel approaches to incorporate biological information that could improve performance.

Figure 6.

Figure 6.

Resampled R2 for AIM-CSRML fragments (6a) versus resampled R2 for ToxPrints (6b)

Since the fitted R2 values (shown in Table 5) were quite similar, a paired t-test was conducted on the resampled RMSE values to determine whether there was any significant difference in the mean performance between the two models. Setting the significance level to 0.05, the p-value was computed to be very low (1.66E-14), indicating that the null hypothesis could be rejected and the difference between the mean performances of the AIM-CSRML and ToxPrint-based models was indeed significant.

4.5. GenRA predictions using AIM-CSRML and ToxPrint fragments – ‘local’ performance

ClassyFire annotations (following a taxnonomical structure including Kingdom, Class, subclass, etc.) were assigned to all substances in the rat acute oral LD50 dataset. Class assignments were primarily used to categorise substances. If the number of substances associated with a class was too large (greater than 200), then a subclass assignment was used; since many subclasses contained only 1 substance (164 subclasses contained 1–5 members), only those with greater than 20 members were considered in this ‘local’ analysis (this still represented 5472 substances). Using the class and subclass assignments resulted in substances being aggregated into one of 88 different groups. The distribution of LD50 values on a per-group basis was plotted to gain a perspective of the variation in toxicity potencies. This, in turn, provided additional context when summarising the local performance of the GenRA predictions using both fingerprint sets.

Figure 7 shows the variation in experimental LD50 values across all 88 groups. Phenoxyacetic acid derivatives (43 total) had the lowest spread in experimental values at 1.39 in log molar units whereas napthalenes (122 total) had the highest spread in values at 5.78 log molar units. Benzimidazoles (148 total) had the highest median value in log molar units at 1.17 and fatty alcohols (30 total) had the lowest median value in log molar units at −1.357. The median range across all the groups was 3.25 log molar units.

Figure 7.

Figure 7.

Boxplot of the experimental LD50 values aggregated by ClassyFire group, sorted by smallest to largest median experimental LD50 values

In Figure 8, the mean absolute errors from ToxPrint or AIM-CSRML LD50 predictions were plotted to help identify whether there were any specific groups where one fingerprint type resulted in much better performance. Overall, a positive correlation was observed between the two fingerprint set predictions. Although some groups saw better performance (mean absolute errors less than 0.4), there was no pronounced difference in performance observed between the fingerprint sets for the majority of groups. Examples of groups where performance was noticeably worse in AIM-CSRML fragments relative to ToxPrints included organic dithiophosphoric acids and derivatives as well as alpha-halocarboxylic acids and derivatives, where the mean absolute errors for AIM-CSRML fragments were 0.931 and 0.932, respectively (compared to 0.735 and 0.607 for ToxPrints). In contrast, a group of benzodioxoles showed lower mean absolute errors using AIM-CSRML fragments (0.348) relative to ToxPrints (0.517). The summary performance metrics based on the 88 ClassyFire groups are provided in the supplementary information.

Figure 8.

Figure 8.

Correlation in mean absolute errors arising from both fingerprint types across the 88 chemical groups. Isoindoles and derivatives, are highlighted in red.

The group with the largest difference in absolute errors between the fingerprint sets was alpha-halocarboxylic acids and derivatives with a value of 0.32. There was a large difference in the R2 between the fingerprint set, 0.377 for ToxPrints vs. −0.42 for AIM. The lowest difference in absolute error was observed for group tetracarboxylic acids and derivations, but the R2 for both fragment sets was negative (i.e. predictions are worse than using a simple mean).

There were 11 groups where the R2 for ToxPrints was higher than found in the global analysis, with a maximum value of 0.68 for benzimidazoles. In contrast, the performance for AIM-CSRML fragments exceeded that of the global analysis for 14 groups, with the maximum R2 being 0.65 for benzimidazoles. There were eight groups common to both fragment sets where the local performance was better than found globally: benzimidazoles, nitrobenzenes, naphthalenes, fatty acid esters, fatty alcohols, prenol lipids, ethers, and organic carbonic acids and derivatives. ToxPrints were found to perform better in all but one of these eight, where AIM-CSRML fragments were associated with a higher R2 for the nitrobenzenes group.

There were 6 groups where the AIM-CSRML fragments gave rise to improved local performance, in contrast to ToxPrints. These included aniline and substituted anilines, benzodioxoles, benzenesulfonyl compounds, anthracenes, diazines, and steroid and derivatives. For all except diazines and steroids and derivatives, AIM-CSRML fragments showed an improved local performance relative to its global and global ToxPrint performance.

Table 6 lists the 14 groups for which local performance exceeded global performance, and the fragment set giving rise to the better local performance relative to the global performances reported in Table 3. For these 14 groups listed in Table 6, there may be merit in choosing a specific fragment type when making LD50 estimates.

Table 6.

Groups with the best local performance

ClassyFire group Preferred Fragment Set based on Local performance
aniline and substituted anilines AIM
benzodioxoles AIM
benzenesulfonyl compounds AIM
anthracenes AIM
diazines ToxPrints
steroid & derivatives ToxPrints
benzimidazoles ToxPrints
nitrobenzenes AIM
naphthalenes ToxPrints
fatty acid esters ToxPrints
fatty alcohols ToxPrints
prenol lipids ToxPrints
ethers ToxPrints
organic carbonic acids & derivatives ToxPrints

Most of the groups (>70/88) saw poorer performance locally than globally for both fingerprint sets. The top 5 worse performing groups for each fingerprint set are presented in Table 7, two of which are in common [nitrogen mustard compounds, phenylmethylamines].

Table 7.

Groups with the lowest performance

ToxPrints AIM
allyl-type 1,3-dipolar organic compounds alpha-halocarboxylic acids and derivatives
phenyl methylcarbamates benzenesulfonamides
nitrogen mustard compounds nitrogen mustard compounds
triazines phenylmethylamines
phenylmethylamines trifluoromethylbenzenes

In an effort to probe some of the reasons for the poor local predictions, selected groups with the large differences in absolute errors between the fingerprint sets were re-considered. One such group illustrated herein was that of isoindoles and derivatives, which is represented in red in Figure 8. Predictions for this group, consisting of 27 substances, had an absolute difference in error between the fingerprint sets of 0.136 where AIM-CSRML predictions were better performing relative to the ToxPrint predictions based on R2 values (0.329 vs 0.141). (see Figure 8 where the group is tagged in red for ease of reference).

The distribution of AIM-CSRML and ToxPrint features within this group relative to the whole dataset was explored to provide an indication of why the former set of features performed better (shown above the line in Figure 9). Whilst in Figure 9a, the distribution of some of the AIM-CSRML fragments in the group (y-axis) were >0.8, meaning that 80% of the structures in this group contained that fragment, this did not account for why the performance using ToxPrints (Figure 9b) was that much poorer.

Figure 9.

Figure 9.

Distribution of AIM-CSRML (a) or ToxPrint (b) fragments in the isoindoles and derivative group relative to the full dataset

To that end, the individual difference in errors of structures within the group were explored to help identify reasons for discrepancies in predictive performance, e.g., where one feature outperforms the other or whether there were outliers in one case and not the other (see Figure 10). A positive correlation between the two feature sets’ residuals was observed with no notable outliers to account for the difference in performance. The substance marked in red was examined in more detail.

Figure 10:

Figure 10:

Errors of the individual structures within the isoindoles and derivatives group.

The predictions for Folpet (DTXSID0021385, CASRN 133–07-3) using both fingerprint sets were examined by exploring its nearest neighbours (Figure 11). Although the LD50 values of the nearest neighbours according to the ToxPrints in Figure 11b exhibit a higher variance relative to the AIM-CSRML values in Figure 11a, the neighbours themselves appear more similar based on their Jaccard indices. This behaviour accounts for the differences in performance within a given group. Folpet is predicted better with AIM-CSRML fragments than ToxPrints, as shown by the black line in Figure 11.

Figure 11:

Figure 11:

Scatterplot of the nearest neighbours by Jaccard similarity relative to experimental toxicity. The red dot represents the target chemical, DTXSID0021385, whereas the black line depicts the prediction.

Within the phenoxyacetic acid derivatives group, the predictions for MCPA-sodium (DTXSID2034700, CASRN 3653–48-3) using both fingerprint sets were examined by exploring its nearest neighbours (Figure 12). In this case, there are closer neighbours based on ToxPrints, with less variation in their LD50 values, accounting for the better prediction observed relative to that derived using AIM-CSRML fragments.

Figure 12:

Figure 12:

Scatterplot of the nearest neighbours by Jaccard similarity relative to experimental toxicity. The red dot represents the target chemical, DTXSID2034700, whereas the black line depicts the prediction.

5.0. Summary

The AIM fragments were created over 20 years ago based on the relatively limited structure databases available at the time, although they were later used to profile the much larger AIMDB of more than 86,000 substances to create an indicative list of analogues. AIM has been in use for many years, but more recently, there have been challenges in using the AIM standalone tool since it is unusable on Windows OS currently supported by Microsoft. In addition, the closed nature of the AIM algorithm, lack of transparency and complete documentation, and reliance upon a non-standard fragment-encoding language incompatible with current SMILES and cheminformatics tools have all made the task of objectively assessing or updating the AIM approach difficult. Herein, a new CSRML file (AIM-CSRML) has been created to codify the historical AIM fragments in a machine-readable format that can be used with the public Chemotyper tool to both visualise structures with matched fragments, as well as generate fingerprint sets. Lacking definitive or transferable representations of the original AIM fragments, we attempted to use the available information to infer a set of CSRML fragments consistent with fragment images and descriptions in the AIM documentation, as well as reproducing the large indicative AIMDB fingerprint file. In this manner, an AIM-CSRML fingerprint file was created yielding a mean Jaccard similarity of over 80% with the original AIMDB fingerprint file, with high sensitivity and specificity values, 89.5% and 99.9%, respectively. Future work will consider benchmarking the AIM-CSRML against the version of AIM that is still in use by OPPT to evaluate whether there are additional rules based on properties such as LogKow that should be incorporated.

A working hypothesis was that AIM fragments were likely to be similar in terms of their coverage to another fragment set, ToxPrints, at least on the basis of adhoc qualitative comparisons. An indirect comparison was made using a large inventory of 26,099 substances of regulatory interest to EPA. The comparison was indirect as the exact structural patterns for ToxPrints are not publicly accessible. The overall mean Jaccard similarities between the sets found that the two sets of fragments were quite dissimilar which would impact the analogues returned as part of a read-across approach and the toxicity predictions made.

Finally, a GenRA analysis was performed using a large set of substances with available rat acute oral LD50 values to evaluate their application in a read-across context. The performance of the AIM and ToxPrint fragments appeared comparable though an assessment of their RMSE showed their performance to be markedly different. Morgan fingerprints that had been used in the original GenRA analysis published in Helman et al [6] yielded somewhat higher performance as have subsequent analyses, see [22]. A comparison of the performance on a local basis made use of the ClassyFire taxonomy to subset the LD50 dataset into smaller categories commensurate with how a chemist might assign substances based on functional groups. The local performance within a subset of categories (14) was found to be substantially higher than the global predictive performance. Specific examples of categories where the performance was improved were highlighted, together with which fingerprint type was preferred in those cases. The mean absolute errors on a per-group basis for the AIM-CSRML and ToxPrint fingerprints were highly correlated, which masked the underlying reasons for the differences observed. Looking at groups where the differences were higher highlighted the likely reasons behind the prediction discrepancies. In the two examples selected for illustrative purposes, the main reasons behind the predictions being better or worse were simply due to how variable the toxicity data for the nearest neighbours were and whether neighbours with a high Jaccard similarity could be identified.

Given the dissimilarity between the 2 fragment sets and how they performed in prediction of LD50, future work will consider the merits of a consensus set of AIM-CSRML and ToxPrint fragments for use in analogue identification and read-across applications.

Supplementary Material

Supplement1

Highlights.

  • Developed a CSRML version of AIM fragments.

  • Evaluated performance of AIM fragments relative to the AIM database.

  • Compared AIM and ToxPrint fragments against a chemistry list of interest to EPA.

  • Used AIM and ToxPrint fragments in the prediction of LD50 with GenRA.

Footnotes

Disclaimer

The views expressed in this article are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.

References

  • [1].Patlewicz G, Helman G, Pradeep P, Shah I, Navigating through the minefield of read-across tools: A review of in silico tools for grouping, Comput Toxicol. 3 (2017) 1–18. 10.1016/j.comtox.2017.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].LCSA, Frank R. Lautenberg Chemical Safety for the 21st Century Act, (2016). https://www.congress.gov/114/plaws/publ182/PLAW-114publ182.pdf.
  • [3].Shah I, Liu J, Judson RS, Thomas RS, Patlewicz G, Systematically evaluating read-across prediction and performance using a local validity approach characterized by chemical structure and bioactivity information, Regulatory Toxicology and Pharmacology. 79 (2016) 12–24. 10.1016/j.yrtph.2016.05.008. [DOI] [PubMed] [Google Scholar]
  • [4].Helman G, Shah I, Williams AJ, Edwards J, Dunne J, Patlewicz G, Generalized Read-Across (GenRA): A workflow implemented into the EPA CompTox Chemicals Dashboard, ALTEX - Alternatives to Animal Experimentation. 36 (2019) 462–465. 10.14573/altex.1811292. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Yang C, Tarkhov A, Marusczyk J, Bienfait B, Gasteiger J, Kleinoeder T, Magdziarz T, Sacher O, Schwab CH, Schwoebel J, Terfloth L, Arvidson K, Richard A, Worth A, Rathman J, New publicly available chemical query language, CSRML, to support chemotype representations for application to data mining and modeling, J Chem Inf Model. 55 (2015) 510–528. 10.1021/ci500667v. [DOI] [PubMed] [Google Scholar]
  • [6].Helman G, Shah I, Patlewicz G, Transitioning the Generalised Read-Across approach (GenRA) to quantitative predictions: A case study using acute oral toxicity data, Comput Toxicol. 12 (2019). 10.1016/j.comtox.2019.100097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Landrum G, RDKit: Open-source cheminformatics; http://www.rdkit.org, (n.d.).
  • [8].Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, Patlewicz G, Shah I, Wambaugh JF, Judson RS, Richard AM, The CompTox Chemistry Dashboard: a community data resource for environmental chemistry, J Cheminform. 9 (2017) 61. 10.1186/s13321-017-0247-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Richard AM, Huang R, Waidyanatha S, Shinn P, Collins BJ, Thillainadarajah I, Grulke CM, Williams AJ, Lougee RR, Judson RS, Houck KA, Shobair M, Yang C, Rathman JF, Yasgar A, Fitzpatrick SC, Simeonov A, Thomas RS, Crofton KM, Paules RS, Bucher JR, Austin CP, Kavlock RJ, Tice RR, The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology, Chem. Res. Toxicol 34 (2021) 189–216. 10.1021/acs.chemrestox.0c00264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Richard AM, Judson RS, Houck KA, Grulke CM, Volarath P, Thillainadarajah I, Yang C, Rathman J, Martin MT, Wambaugh JF, Knudsen TB, Kancherla J, Mansouri K, Patlewicz G, Williams AJ, Little SB, Crofton KM, Thomas RS, ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology, Chem Res Toxicol. 29 (2016) 1225–1251. 10.1021/acs.chemrestox.6b00135. [DOI] [PubMed] [Google Scholar]
  • [11].Shah I, Tate T, Patlewicz G, Generalised Read-Across prediction using genra-py, Bioinformatics. (2021) btab210. 10.1093/bioinformatics/btab210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Rogers D, Hahn M, Extended-connectivity fingerprints J Chem Inf Model. 50 (2010) 742–754. 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
  • [13].Djoumbou Feunang Y, Eisner R, Knox C, Chepelev L, Hastings J, Owen G, Fahy E, Steinbeck C, Subramanian S, Bolton E, Greiner R, Wishart DS, ClassyFire: automated chemical classification with a comprehensive, computable taxonomy, J Cheminform. 8 (2016) 61. 10.1186/s13321-016-0174-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Anaconda Software Distribution, Anaconda Documentation., (2020). https://docs.anaconda.com/.
  • [15].Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É, Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res 12 (2011) 2825–2830. [Google Scholar]
  • [16].The Pandas Development Team. 2020. pandas-dev/pandas: Pandas. 10.5281/zenodo.3509134, (n.d.). [DOI]
  • [17].Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, del Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE, Array programming with NumPy, Nature. 585 (2020) 357–362. 10.1038/s41586-020-2649-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Hunter JD, Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering. 9 (2007) 90–95. 10.1109/MCSE.2007.55. [DOI] [Google Scholar]
  • [19].Waskom ML, seaborn: statistical data visualization, Journal of Open Source Software. 6 (2021) 3021. 10.21105/joss.03021. [DOI] [Google Scholar]
  • [20].Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, van der Walt SJ, Brett M, Wilson J, Millman KJ, Mayorov N, Nelson ARJ, Jones E, Kern R, Larson E, Carey CJ, Polat İ, Feng Y, Moore EW, VanderPlas J, Laxalde D, Perktold J, Cimrman R, Henriksen I, Quintero EA, Harris CR, Archibald AM, Ribeiro AH, Pedregosa F, van Mulbregt P, SciPy 1.0: fundamental algorithms for scientific computing in Python, Nat Methods. 17 (2020) 261–272. 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Kluyver T, Jupyter Notebooks – a publishing format for reproducible computational workflows. In Loizides F & Schmidt B, eds. Positioning and Power in Academic Publishing: Players, Agents and Agendas. pp. 87–90., 2016. [Google Scholar]
  • [22].Mansouri K, Karmaus AL, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM, Andrade CH, Auernhammer TR, Ballabio D, Bell S, Benfenati E, Bhattacharya S, Bastos JV, Boyd S, Brown JB, Capuzzi SJ, Chushak Y, Ciallella H, Clark AM, Consonni V, Daga PR, Ekins S, Farag S, Fedorov M, Fourches D, Gadaleta D, Gao F, Gearhart JM, Goh G, Goodman JM, Grisoni F, Grulke CM, Hartung T, Hirn M, Karpov P, Korotcov A, Lavado GJ, Lawless M, Li X, Luechtefeld T, Lunghini F, Mangiatordi GF, Marcou G, Marsh D, Martin T, Mauri, Muratov EN, Myatt GJ, Nguyen DT, Nicolotti O, Note R, Pande P, Parks AK, Peryea T, Polash AH, Rallo R, Roncaglioni A, Rowlands C, Ruiz P, Russo DP, Sayed A, Sayre R, Sheils T, Siegel C, Silva AC, Simeonov A, Sosnin S, Southall N, Strickland J, Tang Y, Teppen B, Tetko IV, Thomas D, Tkachenko V, Todeschini R, Toma C, Tripodi I, Trisciuzzi D, Tropsha A, Varnek A, Vukovic K, Wang Z, Wang L, Waters KM, Wedlake AJ, Wijeyesakere SJ, Wilson D, Xiao Z, Yang H, Zahoranszky-Kohalmi G, Zakharov AV, Zhang FF, Zhang Z, Zhao T, Zhu H, Zorn KM, Casey W, Kleinstreuer NC. CATMoS: Collaborative Acute Toxicity Modeling Suite. Environ Health Perspect. 2021. 129(4):47013. doi: 10.1289/EHP8495. Erratum in: Environ Health Perspect. 2021 129(7):79001. Erratum in: Environ Health Perspect. 2021 129(10):109001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Edwards SW, Nelms M, Hench VK., Ponder J, Sullivan K. Mapping Mechanistic Pathways of Acute Oral Systemic Toxicity Using Chemical Structure and Bioactivity Measurements. Frontiers in Toxicology 2022. 4. 10.3389/ftox.2022.824094 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement1

RESOURCES