Skip to main content
EPA Author Manuscripts logoLink to EPA Author Manuscripts
. Author manuscript; available in PMC: 2024 May 1.
Published in final edited form as: Comput Toxicol. 2023 May;26:10.1016/j.comtox.2023.100271. doi: 10.1016/j.comtox.2023.100271

Evaluating the utility of a high throughput thiol-containing fluorescent probe to screen for reactivity: A case study with the Tox21 library

Grace Patlewicz a,*, Katie Paul-Friedman a, Keith Houck a, Li Zhang b, Ruili Huang b, Menghang Xia b, Jason Brown a, Steven O Simmons a
PMCID: PMC10304587  NIHMSID: NIHMS1895630  PMID: 37388277

Abstract

High-throughput screening (HTS) assays for bioactivity in the Tox21 program aim to evaluate an array of different biological targets and pathways, but a significant barrier to interpretation of these data is the lack of high-throughput screening (HTS) assays intended to identify non-specific reactive chemicals. This is an important aspect for prioritising chemicals to test in specific assays, identifying promiscuous chemicals based on their reactivity, as well as addressing hazards such as skin sensitisation which are not necessarily initiated by a receptor-mediated effect but act through a non-specific mechanism. Herein, a fluorescence-based HTS assay that allows the identification of thiol-reactive compounds was used to screen 7,872 unique chemicals in the Tox21 10K chemical library. Active chemicals were compared with profiling outcomes using structural alerts encoding electrophilic information. Random Forest classification models based on chemical fingerprints were developed to predict assay outcomes and evaluated through 10-fold stratified cross validation (CV). The mean CV Balanced Accuracy of the validation set was 0.648. The model developed shows promise as a tool to screen untested chemicals for their potential electrophilic reactivity based solely on chemical structural features.

Keywords: reactivity assay, reaction mechanistic domains, Tox21, in silico prediction models

1.0. Introduction

To date, the Tox21 screening programme has screened over 70 cell-based assays against the Tox21 10k chemical library using a quantitative High Throughput Screening (qHTS) format [13]. Existing HTS assays that have been applied in both the Tox21 and ToxCast screening programmes are largely tailored towards specific receptor and signaling pathway effects. Currently available Tox21 assays include various nuclear receptor assays such as those for estrogen, androgen, assays that characterise enzyme activity, membrane integrity as well as different toxicity pathways (see https://ncats.nih.gov/tox21/projects/assays for a full listing of current assays). A number of the Tox21 assays have also been the subject of various machine learning studies to develop prediction models on the basis of chemical structure. Notable examples include various studies building prediction models for the 12 Tox21 nuclear receptor signaling toxicity assays, the so called Tox21 challenge dataset [47]. Recent reviews describing the state of the art of machine learning in predictive toxicity include Wang et al. [8] who discussed the most common machine learning techniques used for classification models as well as Cavasotto and Scardino [9] who outlined recent progress in the approaches used to predict a range of different toxicity endpoints (cardiotoxicity, acute oral toxicity, heptatoxicity and Tox21 assay outcomes) and their chemical representations.

A clear need in interpreting HTS bioactivity data is an ability to distinguish specific signal in targeted screening assays from non-specific signal arising from signal detection interference, cytotoxicity, and/or reactivity. Recognition of the importance of predicting chemicals that cause assay interference resulting in false positives has motivated the development of HTS assays for autofluorescence and machine learning models to predict this interference on the basis of chemical properties and structure descriptors [1012]. Previous work aimed at understanding selective bioactivity that occurs in the absence of confounding interference from cytotoxicity has made use of a number of cell viability and cytotoxicity assays [13]. Machine learning and structure-activity models have also been derived to predict cytotoxicity [14,15]. However, there are no HTS assays used in the Tox21 programme that characterise non-specific reactivities, another important dimension of understanding the non-specific and/or promiscuous responses across in vitro assays. Chemical reactivity is relevant for prediction of a variety of adverse effects in vivo such as skin sensitisation, liver, and kidney toxicities mediated by chemical electrophiles binding to cellular nucleophiles. Measurement and estimation of electrophilic reactivity for predictive toxicity has been an area of study for many years. Schultz et al. [16] outlined a conceptual framework for reactivity and how this was linked to the molecular initiating events as part of an Adverse Outcome Pathway for different endpoints [17]. Schwöbel et al. [18] summarised the state of the art in terms of the different computational and experimental approaches that exist to characterise electrophilic reactivity including reaction mechanistic domains [19,20] based on organic chemistry principles through to different in chemico assays. For a comprehensive review of how chemical reactivity relates to toxicity, the reader is referred to Cronin at al. [21] and LoPachin et al. [22].

Herein results from screening the Tox21 library using the fluorescence-based (E)-2-(4-mercaptostyrl)-1,3,3-trimethyl-3H-indol-1-ium (MSTI) assay developed by McCallum et al. [23] that enables the identification of thiol-reactive small molecules in a high-throughput manner are presented and used to inform structure-activity relationships.

In our earlier manuscript [24], we compared and contrasted several published protein binding structural alert schemes for 5 major reaction mechanism domains [20] (Michael acceptors (MA), Schiff base formers (SB), Acyl transfer agents (Acyl), SN2 (substitution nucleophilic bi-molecular), SNAr (nucleophilic aromatic substitution)) relative to manual expert assignments that had been annotated in a published skin sensitisation database [25]. The motivation at the time was to better understand the scope and performance of alert schemes that could provide an indication of potential reactivity. Based on a ‘consensus’ performance, the same schemes were then used to profile the Tox21 10K chemical library to predict which substances were most likely to be reactive. A ToxPrint (TxP) enrichment analysis was also performed to identify which TxPs were particularly enriched within each reaction domain. Herein, we compare the MSTI experimental results with the reaction domain predictions made previously. Furthermore, we evaluated the TxP enrichments from the MSTI results and contrasted these with those enrichments previously identified. Finally, machine learning approaches were applied, utilising structural features to build statistical quantitative structure-activity relationship (QSAR) models to predict the MSTI outcome of untested substances. This predictive modelling approach could assist in the detection of environmental substances that might produce confounding results in other high-throughput screening results as well as provide additional useful structural insight for identifying reactive substances.

The aims of this study are therefore:

  1. Summarise the results of the MSTI assay screening performed on the Tox21 10K chemical library;

  2. Compare the profiling outcomes from Nelms et al. [24] with the MSTI assay results;

  3. Perform an enrichment analysis using TxPs on the Tox21 MSTI screening results to identify significantly enriched structural features;

  4. Derive a structure-based model to predict the MSTI assay outcome for untested substances and,

  5. Use the model to profile a chemical inventory of regulatory interest to demonstrate practical utility.

2.0. Materials and Methods

2.1. Chemical library

The Tox21 chemical library contains approximately 10,000 samples including pesticides, industrial chemicals and food additives commercially sourced by the EPA chemical vendor, Evotec [3]. At the time when the MSTI was being screened, the library contained physical samples for 7,872 unique substances that were available for screening.

2.2. MSTI assay

The MSTI assay was conducted as described previously [23] with some modification. Briefly, 4 µL of 4 µM a fluorescent probe (E)-2-(4-mercaptostyryl)-1,3,3-trimethyl-3H-indol-1-ium iodide in PBS containing 2% DMSO or 2% DMSO in PBS only was dispensed into medium-binding black/solid bottom 1,536-well plates (Greiner Bio-One North America Inc., Monroe, NC) by BioRAPTR Flying Reagent Dispenser (FRD, Beckman Coulter, Inc., Brea, CA).

Then 23 nL of test compound, positive control ((E)-3-(3,5-dibromo-2-hydroxyphenyl)-1-(5-methylfuran-2-yl)prop-2-en-1-one; MLS001163887; CASRN 374091–47-1) (MLS), or DMSO only was transferred twice to the assay plates using a Pintool station (Wako Automation, San Diego, CA) to achieve a final compound concentration up to 230 µM. Most of the chemicals were screened at 230 µM, about 30% of the chemicals had lower stock concentrations and were screened at ~115 µM or lower. After 15 seconds of centrifugation at 1000 rpm, the assay plates were incubated at room temperature for 1 hour. The fluorescence intensity of the assay plates was measured using an Envision plate reader (Perkin Elmer, Waltham, MA) at 525 nm excitation and 598 nm emission. Data were expressed as relative fluorescence units.

2.3. MSTI Processing

Raw fluorescence reads were normalised to the positive control, MLS (−100%) and solvent (DMSO+MSTI) wells (0%) as follows:

% Activity=[(Vcompound-VDMSO)/(VDMSO-Vpos)] × 100, where Vcompound denotes the compound well values, Vpos denotes the median value of the positive control wells, and VDMSO denotes the median values of the solvent wells. The data set was then corrected using the solvent compound plates at the beginning and end of the compound plate stack by applying an in-house pattern correction algorithm [8].

Normalised data for the Tox21 MSTI p2 activity assay was downloaded from the Tox21 data repository (https://tripod.nih.gov/tox/) and pipelined using the ToxCast Pipeline (tcpl) package [26]. hese data were generated using a single concentration (sc) of chemical, and as such only sc processing from sc level 0 to sc level 2 was completed in tcpl. The resultant endpoint was named “TOX21_msti_p2_activity”and corresponds to ToxCast assay endpoint id 2543 [i.e. the probe in PBS containing 2% DMSO]. As this assay response is decreased when positive, at sc level 1, response data from Tox21 were multiplied by negative 1 such that all responses would appear in the positive direction (per tcpl v2 which only interprets and fits data in the positive direction). MLS at 30 µM was used as a positive control on each plate (well type “p” in tcpl), with an approximate median response of 100%, confirming that the assay and data pipelining were correct. At sc level 2, baseline sampling variability for the assay endpoint was defined using the DMSO wells with an activity threshold set at 3*baseline median absolute deviation (a robust measure of the spread of the DMSO well responses, centred around zero), equal to 13.9%. The Tox21_msti p2_activity assay was only considered interpretable in a single direction. Any sample with activity greater than this threshold was classified as a positive response with a hit-call equal to 1.

The variance in the DMSO control wells was low (median absolute deviation of 4.63 across all plates run, as noted above in calculating the 3*baseline median absolute deviation). Assay quality was further explored via calculation of a robust Z’ (rZ’) value, calculated using the normalised data and the following equation:

rZf,i=13pmad,i+bmad,iX˜pos,iX˜DMSO,i

where pmad,i is the rescaled median absolute deviation of the normalised positive control (MLS response) values on the ith plate, bmad,i is the rescaled median absolute deviation of the normalised DMSO values on the ith plate, X˜pos,i is the median of the positive controls on the ith plate, and X˜DMSO,i is the median of the normalised DMSO values on the ith plate. The rZ’ uses the rescaled median absolute deviation of the positive control values in place of the standard deviation, and the median of the positive and vehicle control values in place of the mean. A rZ’ of 0.5–1.0 corresponds to an assay with sufficiently high signal-to-background capability and an intersample variability that is low enough to distinguish positive and negative chemicals in screening with increasing ability to distinguish signal from baseline increasing as the rZ’ approaches 1.0.

There were 7,872 unique substances within the 9,670 total samples due to internal replication for some substances (1367 substances in duplicates and 215 substances in triplicate; for triplicate substances, one of the replicates was typically from a different solubilisation or plating). To facilitate the most appropriate aggregation strategy to manage replicate values, a single assay analysis was conducted. The analysis was performed by re-sampling single assay outcomes from substances with multiple MSTI hit-call outcomes available. The following steps were carried out:

  1. The dataset was filtered to extract substances with more than 1 MSTI outcome;

  2. For each substance, one of the MSTI outcomes (drawn from either the duplicative or triplicate set of outcomes) was randomly sampled. If the outcome was positive, the substance was ‘categorised’ as positive, otherwise negative;

  3. This was repeated 500 times, resulting in 500 replicates for the MSTI outcome of a given substance. An unalikeability coefficient as described by Perry and Kader [27] was used as the measure of the variability and is defined as how often observations differ from one another. Assuming the hit-call in the MSTI assay is a Bernoulli trial, the mean of a Bernoulli variable is expressed as p (the proportion of positives), the variance as p*(1-p) and the unalikeability as 2*variance. Inspecting the distribution of the unalikeability resampled values would provide some insight as far as how hit-calls might best be summarised (e.g. minimum, median or maximum values). If the distribution of the unalikeability values resulted in a 90% or higher of ‘0’s (observations not differing from one another) then the median value of the MSTI hit-calls would be calculated when aggregating the results, otherwise a maximum value would be used.

2.4. Data Preparation for QSAR modelling

Chemicals used in this study were represented by unique DSSTox Substance Identifiers (DTXSID) [28]. A batch search of EPA’s CompTox Chemicals Dashboard [29] was conducted to retrieve SMILES, QSAR-READY SMILES, Preferred Names, and CAS registry numbers (CASRN). QSAR-READY SMILES are SMILES that have been standardised to remove metal ions, salt fragments, stereochemistry. Chemical substances in the DSSTox database have been curated and standardised to ensure correctness in chemical structure as well as their associations to chemical names and other identifiers such as CAS registry numbers. Examples of this curation include checking for errors and mismatches in chemical structure formats and mapping to identifiers, as well as structure validation issues like hyper-valency, etc. From the 7,872 substances tested, SMILES were available for 7,532 substances, which were used for subsequent data analysis.

2.5. Landscape Projection

To visualise the structural relationship of the TxPs fingerprints projections into two dimensions using two different approaches were undertaken: 1) a t-distributed stochastic neighbour embedding (t-SNE) and 2) a generative topological mapping (GTM). T-SNE [30] is an unsupervised dimensionality reduction technique where the focus is on keeping very similar data points close together in lower dimensional space. It preserves local structure of the data using a student t-distribution to compute the similarity between 2 points in lower dimensional space. GTM is a probabilistic extension of self organising maps that was developed by Bishop et al. [31]. In contrast to t-SNE where data in initial multidimensional space are projected into two-dimensional (2D) space, the GTM model is defined in terms of a mapping from the latent space into the data space. The mapping is then inverted using Bayes’ theorem, giving rise to a posterior distribution in latent space, which provides the data visualisation.

The TxPs features were first converted to principal components to reduce the number of dimensions to a lesser number to suppress noise and speed up the computation of the pairwise distances between the substances. The projections were then plotted and overlaid with the hit-call outcome to help explore the extent to which clusters of similar structural substances were associated with positive responses.

2.6. Chemotype ToxPrint enrichment

Corina Symphony on the command line (licensed from Molecular Networks GmBH and Altamira LLC) was used to compute the 729 public TxPs [32]. The Fisher’s exact test was used to compute an odds ratio and associated p value for each TxP relative to the active or inactive outcomes in the MSTI assay. This was comparable with the methodology discussed in Wang et al. [33,34]. A TxP was considered enriched if it had an odds ratio greater than or equal to 3, a one-sided Fishers exact p-value less than 0.05 (probability value of the odds ratio being greater than 1) and the number of True Positives (TP) was determined to be greater than or equal to 3. P-values were corrected using the Benjamini and Hochberg [35] technique to adjust for the false discovery rate. FDR-controlling procedures provide less stringent control of Type I errors compared to family-wise error rate (FWER) controlling procedures (such as the Bonferroni correction (https://en.wikipedia.org/wiki/Bonferroni_correction)).

Insights to rationalise the TxPs identified made reference to the alert schemes previously evaluated in Nelms et al. [24] as well as the profilers existing within the OECD Toolbox (qsartoolbox.org). TxPs were inspected using the Chemotyper tool (chemotyper.org, a freely available tool from Molecular Networks) to interpret the features based on their structural depictions.

TxPs were also used as the descriptor set for subsequent QSAR modelling.

2.7. QSAR Modelling

Classification QSAR models were developed to discriminate between an active (reactive) and inactive (unreactive) response in the MSTI assay. The modelling workflow was similar to other published work [36,37].

2.8. Machine Learning Approach

Eight machine learning-based approaches as implemented in the Python library scikit-learn [38] were investigated in an effort to derive an QSAR classification model to predict MSTI activity. These were chosen to cover a range of linear and nonlinear approaches and were primarily used to establish the range in expected performance. The approaches (described in more detail in [39,40]) were namely: support vector machine with linear and radial basis kernels, linear discriminant analysis (LDA), Ridge Classifier K-nearest Neighbours, Gaussian Naive Bayes, Random Forest, Logistic Regression and Neural Networks. A ‘naive’ classifier (using the Dummy Classifier within scikit-learn with the ‘most frequency’ as strategy) which predicts a single class for all of the substances regardless of their original class, was first run on the dataset to serve as a simple baseline. This ‘naive’ classifier does not generate any insight about the data since it is completely independent of the training input parameters.

TxPs that had a null variance or near constant value across substances were removed from further consideration. The VarianceThreshold function within scikit-learn with a threshold of 0.005 was used, such that TxPs where 99.5% of the values across the chemicals were similar were dropped. The eight machine learning approaches helped to identify the most promising model(s) as assessed by balanced accuracy (BA). The model with the highest performance was then tuned using a grid optimisation on a 10-fold stratified cross-validation procedure to maximise BA performance.

The performance of the base models and final model was evaluated using BA. Other characteristics namely number of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) were also derived to compute the sensitivity (SE, recall), specificity (SP), precision, F1 score (harmonic mean of Precision and Recall) and Matthews Correlation Coefficient (MCC).

SE=TPTP+FN
SP=TNTN+FP
PrecisionTPTP+FP
MCC=TP*TNFP*FNTP+FPTP+FNTN+FPTN+FN

The optimised hyperparameters were then used to retrain the model on the full training set and used to make predictions for the validation set described in the next section.

2.9. Validation Set

The dataset was split into a training and test set by stratified random sampling using the hit call outcome label to retain the ratio of actives to inactives. This specific test set termed the validation set was not used in any of the model development or cross validation evaluations, but reserved to evaluate performance of the final model using naïve cases.

It is acknowledged that such a random split could still result in producing overly optimistic estimates of the model’s generalisability due to the training set potentially overlocalising the model to particular regions of chemical space. Indeed, a number of data splitting strategies for QSAR model development exist as investigated in Puzyn et al. [41]. For example, in van Tilborg et al. [42], substances were first clustered on substructure similarity using spectral clustering on extended connectivity fingerprints. For each cluster, substances were split into a training and test set by stratified random sampling to ensure a proportional representation of the number of activity cliff substances in both train and test sets as well as preserving structural similarity between training and test substances. Halder et al. [43] have applied k-means cluster analysis (kMCA) based data division. This divides the data into n (user specified) clusters on the basis of input descriptors. Subsequently, a specific number of validation set samples are randomly collected from each cluster. Other approaches beyond random splitting include the Kennard-Stone (KS) algorithm [44] which is based on the calculation of the Euclidean distance between samples. The KS starts by choosing 2 substances that have the largest Euclidean distance and placing them into the test set. Subsequent points are selected that maximise the Euclidean distance from previously selected substances. This process of adding substance to the test set is stopped when it reaches the specified size decided beforehand.

Figure 1 captures the main steps of the modelling workflow applied in this study.

Figure 1.

Figure 1.

Modelling workflow steps

2.10. Applicability Domain (AD)

A global applicability domain was determined using the leverage score as described by Grammatica et al. [45]. In a recent review by Rakhimbekova et al. [46], the leverage h of a test set is based on the “hat” matrix as h = (x1T(XTX)−1×1) where X is the feature set of the training set of substances and x1 makes reference to the feature set of the test set of substances. The leverage threshold is typically defined as 3*(M+1)/N where M is the number of features and N is the number of training substances. The validation set of substances were subjected to the leverage threshold calculation to determine how many substances fell outside of the applicability domain.

A local applicability domain was also derived by first calculating the average Jaccard distances for each training substance from its five nearest neighbours. The 95th percentile of the corresponding distance matrix was then derived and set as the threshold. The average distance of a test substance from its five nearest neighbours in the training set was then compared with the defined threshold. If the average distance of the test substance was less or equal to the threshold value, it was considered inside the applicability domain (AD).

2.11. Application of modelling results to an inventory of regulatory interest

The TSCA Non-confidential Active inventory hosted on the EPA CompTox Chemicals Dashboard (www.comptox.epa.gov/dashboard, at the time of writing, September 2022) was downloaded and processed through Corina Symphony to generate TxPs. The TSCA inventory hosted on the Dashboard comprises 33,365 substances of which 14,365 have a defined structure (the discrepancy lies in the number of UVCBs – chemical substances of unknown or variable composition, complex reaction products and biological materials). This list is based on notifications through February 2018 and substances have been unambiguously mapped to DSSTox using CASRN and chemical names. The substances on the inventory are designated “active” to indicate active in US commerce. TxPs could be generated for 14,067 substances in the list.

3.0. Data analysis and processing

Pipelining of the MSTI data was conducted using the R version 4.0.2 and library tcpl version 2.1.0 (https://cran.r-project.org/package=tcpl). All other analysis was performed using Python 3.9 within Jupyter notebooks using standard python libraries – NumPy [47], Pandas [48], Matplotlib [49] Seaborn [50], RDkit (RDKit: Open-source cheminformatics; http://www.rdkit.org) [51] and Scikit-learn [38]. GTM relied upon the ugtm package developed by Gaspar [52]. The code repository supporting this analysis is available at https://github.com/g-patlewicz/msti-tox21. All data files are available at https://doi.org/10.23645/epacomptox.22587706.

4.0. Results and Discussion

4.1. MSTI assay performance in screening and overall activity

To identify thiol-reactive small molecules, the MSTI assay was used to screen the Tox21 10K chemical library containing 7,872 unique chemicals. In the primary screening, MLS (positive control) significantly reduced MSTI fluorescence signal in a concentration responsive manner with an IC50 of 0.32 ± 0.09 µM. The screening performed well with signal-to-background ratio of 2.38 ± 0.75, coefficient of variation of 2.19 ± 0.34 %, and Z’-factor of 0.74 ± 0.04. Amongst the 9,670 samples tested in the screening, 2,688 samples (27.8%) tested positive and the other 6982 (72.2%) tested negative. This hit rate appears quite high relative to the hit rate of other Tox21 viability assays (median hit rate across the 85 viability assays where 1000 or more substances were screened was 11%). That said the MSTI hit rate might not be unreasonable given the top target concentration of 230 µM is very high and some ~30% of chemicals could not achieve this level due to stock concentration limitations. The MSTI assay may also be capturing other sources of signal and interference – exploring the Tox21 chemical level hit rate distributions revealed some 28% of MSTI actives had active hit rates > 20% suggesting these MSTI actives are promiscuous (see Figure S1). Figure 2a shows the proportion of samples that were tested positive or negative in the MSTI assay. Figure 2b shows the distribution of maximum median responses across the samples tested with a vertical line showing the maximum median response for the positive control, MLS.

Figure 2:

Figure 2:

(a) Frequency of hit-call outcomes across all MSTI samples tested (b) the distribution of the maximum median response (max_med) values. The red vertical line indicates the maximum median response of the positive control, MLS.

There were 1,582 replicated substances that had more than one experimental result, from which the single assay analysis (described in Section 2.3) could be conducted. Figure 3 shows the results from this analysis.

Figure 3:

Figure 3:

Unalikeability frequency plot for replicate samples. More than 85% of substances showed agreement across their resampled values, demonstrating a high consistency in the hit-call outcomes.

There was agreement in the assay outcomes for 85.78% (1,357) of substances based on the resampled replicate studies. Although agreement was high, a conservative approach was used taking the positive outcome as the result to be carried forward in the analysis if there were disconcordant positive and negative outcomes i.e. the maximum value.

Having aggregated replicated outcomes, the hit-call distribution shifted very slightly such that 71% (5,593) of tested substances were found to be negative and 29% (2,279) as tested were positive.

4.2. Landscape projection

Figure 4 shows the t-SNE 2D projection for the structurable substances on the basis of TxP fingerprints. The projection is colour coded by the hit-call activity revealing a broad representation of reactive chemicals throughout the landscape and only isolated clusters where many active substances appeared to be closely aggregated together.

Figure 4:

Figure 4:

2D t-SNE projection for substances characterised by TxP features and colour coded by MSTI activity where a ‘1’ label indicates active (reactive) and a ‘0’ label indicates inactive (not reactive).

A similar broad representation across the structural space was found using GTM as shown in Figure 5.

Figure 5:

Figure 5:

GTM using means for the TxP fingerprints with colour coding based on hit-call information

Overall, there were no apparent discrimination based on TxP features that would help separate reactive substances from non-reactive substances. A multivariate response permutation procedure (MRPP) [53] was performed using the TxP and computing their pairwise distance matrices on the basis of a Jaccard metric. This revealed that there was a statistically significant difference between substances that were active and those that were inactive (see Figure S2 supplementary information).

4.3. Enriched ToxPrints

There were 47 TxPs found to be enriched in the active space. These are listed in Table 1. For illustrative purposes, the first four are discussed in brief. The four TxPs were bond:N=N_azo_aliphatic_acyclic, bond:CN_amine_aromatic_benzidine, ring:hetero_[5]_O_furan_a-nitro and ring:hetero_[4]_N_beta_lactam.

Table 1.

List of enriched TxPs and potential inferences of their associations with known organic reaction chemistry principles or other structural alert schemes

OR TxP TP FN FP TN Tested_Active FDR_p Comments
inf bond:N=N_azo_aliphatic_acyclic 4 2181 0 5347 2185 0.039 Substances are all dyes. Potential for Schiff Base formation or SN2 reactions to occur
32.00 bond:CN_amine_aromatic_benzidine 13 2172 1 5346 2185 1.16E-05 Primary aromatic amine that can be activate to an electrophilic nitrenium ion.
29.52 ring:hetero_[5]_O_furan_a-nitro 12 2173 1 5346 2185 3.49E-05 Potential to be transformed to a reactive intermediate similar to that for aromatic amines.
21.45 ring:hetero_[4]_N_beta_lactam 43 2142 5 5342 2185 9.01E-17 Potential to act as an acylating agent.
17.87 ring:hetero_[4]_N_azetidine 43 2142 6 5341 2185 4.92E-16 Superseded by the previous TxP for beta-lactams. Same substances identified.
14.03 ring:hetero_[4]_Z_generic 45 2140 8 5339 2185 1.24E-15 Generalised TxP to characterise strained rings such as epoxides and aziridines. Implicated in SN2 reaction pathways.
12.29 bond:S(=O)O_sulfonicEster_alkyl_S-C 10 2175 2 5345 2185 0.0013 Low MW alkyl sulfates have been implicated for irritation effects.
11.72 bond:CC(=O)C_ketone_alkene_cyclic_3-en-1-one 19 2166 4 5343 2185 2.04E-06 Generic feature but indicative of alpha, beta unsaturated ketones – known to react by Michael addition
11.48 bond:CC(=O)C_quinone_1_4-benzo 46 2139 10 5337 2185 6.49E-15 Quinones are electrophilic by virtue of their ability to act as Michael acceptors to form adducts
10.50 bond:N=N_azo_aromatic 38 2147 9 5338 2185 6.91E-12 Aromatic azo compounds have been implicated in carcinogenicity. Mechanism thought to include reduction, cleavage of the azo group to the corresponding amine. Activation to the reactive nitrenium could then follow.
8.94 bond:CC(=O)C_quinone_1_4-naphtho 36 2149 10 5337 2185 1.44E-10 Potential to act as Michael acceptors owing to the quinone structural feature
8.64 ring:hetero_[6_6_6]_N_S_phenothiazine 21 2164 6 5341 2185 2.83E-06 Implicated for liver toxicity
7.39 bond:NN_hydrazine_acyclic_(connect_noZ) 59 2126 20 5327 2185 1.59E-15 Hydrazine derivatives
6.98 ring:hetero_[5]_N_tetrazole 17 2168 6 5341 2185 0.0001 Potential nephrotoxicant
6.93 bond:C=N_imine_C(connect_H_gt_0) 28 2157 10 5337 2185 2.34E-07 Some substances have the imine functionality in the para position of a conjugated aromatic system – potential for Michael addition
6.92 bond:N=N_azo_generic 39 2146 14 5333 2185 4.96E-10 Discussed under bond:N=N_azo_aromatic
6.55 bond:X~Z_halide-[N_P]_heteroatom_N_generic 8 2177 3 5344 2185 0.02
6.43 bond:NC=O_urea_thio 26 2159 10 5337 2185 1.43E-06
6.34 ring:hetero_[5]_Z_1_2_3_4-Z 18 2167 7 5340 2185 0.0001 Generic version that identifies tetrazole containing substances
5.73 bond:CX_halide_alkyl-F_perfluoro_octyl 7 2178 3 5344 2185 0.045 Potentially reactive as peroxisome proliferators – PPARalpha is thought to mediate most of the PP effects – increase in DNA damage due to oxidative stress is thought to be implicate in rat liver carcinogenicity
5.62 ring:hetero_[5_6]_N_benzimidazole 34 2151 15 5332 2185 9.16E-08 Benzimidazoles have been implicated in carcinogenicity – possibly due to inhibition of hormones
5.45 bond:C=N_imine_N(connect_noZ) 33 2152 15 5332 2185 2.10E-07 Generic C=N moiety – implicated in bond:C=N_imine_C(connect_H_gt_0)
5.18 bond:N(=O)_nitro_aromatic 200 1985 102 5245 2185 2.82E-40 Aromatic nitro groups could become reduced to their corresponding nitroso, hydroxylamine and amine.
4.94 chain:alkeneLinear_diene_1_3-butene 26 2159 13 5334 2185 1.29E-05 Generic TxP which will be associated with other functional groups like carbonyls to permit Michael addition
4.91 chain:alkeneLinear_diene_1_4-diene 10 2175 5 5342 2185 0.017 As above
4.91 bond:X~Z_halide-[N_P]_heteroatom_N 8 2177 4 5343 2185 0.041 Potential SN2 reaction at halogen atom – where halogen is Cl. Chemicals containing Cl have been suggested to be oxidised to an epoxide which can then undergo an SN2 reaction.
4.61 bond:C(=O)N_carbamate_dithio 15 2170 8 5339 2185 0.003 Potential acylating agent
4.61 bond:COH_alcohol_alkene 39 2146 21 5326 2185 1.21E-07
4.60 bond:CN_amine_ter-N_aromatic 168 2017 95 5252 2185 5.8E-31 Potential SN1 reaction mechanism.
Dealkylation to form a primary amine
4.54 bond:CN_amine_ter-N_aromatic_aliphatic 136 2049 77 5270 2185 6.77E-25 As above
4.50 bond:CX_halide_alkyl-F_perfluoro_hexyl 11 2174 6 5341 2185 0.015 Peroxisome proliferators as discussed for the perfluoro_octyl chain
4.50 bond:quatP_phosphonium 11 2174 6 5341 2185 0.015 Quats are known irritants
4.48 bond:N(=O)_nitro_C 207 1978 122 5225 2185 6.87E-37 Generalised feature for bond:N(=O)_nitro_aromatic
4.37 bond:COH_alcohol_alkene_cyclic 37 2148 21 5326 2185 5.62E-07
4.10 bond:S=O_sulfoxide 15 2170 9 5338 2185 0.0049 Generalised TxP – could be flagging for bond:S(=O)O_sulfonicEster_alkyl_S-C
3.99 ring:fused_PAH_anthracene 13 2172 8 5339 2185 0.012 Potential for oxidation to form epoxides
3.81 ring:hetero_[5_6]_N_indole 67 2118 44 5303 2185 1.24E-10 See ring:hetero_[5]_N_pyrrole
3.74 bond:CN_amine_aromatic_generic 525 1660 417 4930 2185 4.17E-73 Aromatic amines leading to formation of nitrenium ions
3.67 bond:C=S_carbonyl_thio_generic 53 2132 36 5311 2185 3.07E-08 Isothiocyanates (acylation) or associate with disulfide bridges (SN2 thio-disulphide interchange)
3.64 bond:C=O_acyl_hydrazide 28 2157 19 5328 2185 0.00011 Toxicity of hydrazine derivatives are thought to be due to the formation of reactive species such as reactive oxygen species, carbo cations or carbon centred radicals
3.51 bond:CN_amine_pri-NH2_aromatic 260 1925 198 5149 2185 1.215E-35 Aromatic amines leading to the formation of nitrenium ions
3.47 bond:S(=O)O_sulfonicAcid_cyclic_(ring) 68 2117 49 5298 2185 8.3E-10
3.46 bond:C(=O)O_carboxylicAcid_alkenyl 84 2101 61 5286 2185 6.92E-12 Acrylate like features – potential to react as Michael acceptor
3.43 bond:NN_hydrazine_alkyl_N(connect_Z=1) 25 2160 18 5329 2185 0.0005 Hydrazine derivatives as noted above
3.29 ring:hetero_[5]_N_pyrrole 76 2109 58 5289 2185 3.35E-10 Pyrroles are cyclic aliphatic amines and likely to act as narcotic amines. Based on the substances – usually fused with a benzene ring (indole) such that second aromatic amines are more likely the reaction mechanism
3.01 chain:alkeneCyclic_diene_cyclopentadiene 22 2163 18 5329 2185 0.0039
3.00 ring:hetero_[5]_O_furan 46 2139 38 5309 2185 7.21E-06 More generalised TxP for ring:hetero_[5]_O_furan_a-nitro

There were four substances that flagged for TxP bond:N=N_azo_aliphatic_acyclic. Inspection of the structures revealed them all to be Acid Yellow dyes containing the following moiety shown in Figure 6.

Figure 6.

Figure 6.

Common scaffold in the substances flagging TxP bond:N=N_azo_aliphatic_acyclic.

Based on this structural motif – there are 2 possible reaction pathways – a SN2 reaction at the carbon connected to the N=N moiety, or the pyrazolone could act as a Schiff Base former (based on alerts captured within the OECD Toolbox).

For the 14 substances presenting TxP bond:CN_amine_aromatic_benzidine, the common scaffold was benzidine itself, a known carcinogen. The benzidine scaffold is a biphenyl ring with amino groups in the para position. The alert that is triggered from this scaffold is for a primary aromatic amine. Primary aromatic amines have to be metabolised to reactive electrophiles to exert their carcinogenic potential. This typically involves an oxidation to form a N-hydroxylamine, which is then activated to form a reactive nitrenium ion [54].

TxP ring:hetero_[5]_O_furan_a-nitro appeared in 13 substances. A potential reaction pathway is that the nitro group could be transformed into a N-hydroxylated intermediate, analogous to the manner in which primary aromatic amines ultimately form a reactive nitrenium ion [55].

TxP ring:hetero_[4]_N_beta_lactam appeared in 48 substances, many of which were antibiotics including Penicillin. Beta-lactams have the potential to act by an acylation reaction mechanism as discussed in Roberts et al. [56].

Table 1 attempts to summarise plausible rationales for how the TxP features might relate to reactivity potential based on inspection of structures and organic chemistry principles or by virtue of existing alert schemes either evaluated in Nelms et al. [24] or as made available as profilers within the OECD Toolbox. TxPs are structured in a hierarchy and notable in the enriched features were how generic features were often identified in conjunction with their more specific variants e.g. bond: CN amine ter-N_aromatic and bond: CN amine ter-N_aromatic_aliphatic. A number of TxPs indicative of potential mechanisms associated with aromatic amines were also identified. Despite some redundancy in the TxPs identified, each of the TxPs were consistent with structural features expected to associated with reactivity.

4.4. Comparison of MSTI outcomes with reaction domains

In Nelms et al. [24], the Tox21 library was profiled using three different reaction alert schemes. These identified 45% of Tox21 substances as positive, i.e. a reaction domain could be assigned indicative of electrophilic potential, and 55% were negative, i.e. no alerts identified. It is worth noting that presence of a reaction domain is not necessarily a guarantee of reactivity; a substance might possess the potential to react but upon testing found to be non-reactive. Moreover, the reaction domain covers both hard and soft electrophiles whereas the MSTI assay (by virtue of acting as a thiol probe) is more likely to identify soft electrophiles; hence there is an expectation that it should be better at identifying Michael acceptors, SNAr and SN2 electrophiles rather than hard electrophiles such as Schiff base formers and acyl transfer agents.

In comparing the MSTI outcomes with those reported in Nelms et al. [24] for the 7,363 substances that were in common, the precision (positive predictive value), recall (sensitivity) and specificity were determined to be 33%, 50% and 59% respectively.

If the Nelms et al. data were filtered to retain only soft electrophiles based on their domain being one of SNAr, SN2 or Michael acceptors (MA), the hit rate for the MSTI assay for the 6,766 substances in common was 28% positive and 72% negative whereas the reaction domains predicted 81% for negative and 19% for positive.

However, the precision and recall of the MSTI dataset relative to the Nelms et al. data was worse at 35% and 24% respectively (Table 2).

Table 2:

Confusion matrix of the MSTI experimental hit-call outcomes relative to reaction domain assignments reported in Nelms et al. [24]

MSTI positive MSTI negative
Nelms positive 1077 2168
Nelms negative 1061 3057
Nelms soft electrophile positive 460 853
Nelms soft electrophile negative 1459 3994

Accordingly, there appears to be little correspondence between these structural alerts, which themselves are largely derived from skin sensitisation data and the MSTI results here. This highlights, in part the different chemical diversity of the Tox21 library relative to the skin sensitisation set of chemicals upon which the majority of those reaction domains had been originally developed.

4.5. QSAR Classification Models for reactivity using the Tox21 MSTI screening data

QSAR models were developed to predict MSTI activity. A range of linear and non-linear machine learning algorithms were used to build classification models using TxPs as descriptors. The dataset was first split at random into separate training and test sets using a stratified approach to ensure the proportion of actives and negatives was conserved in both datasets. 80% of the dataset was used for training purposes, the remaining 20% was used as a final validation set. Removing constant and near constant values resulted in 271 TxPs remaining from the original set of 729.

Initially a simple baseline using the most frequent hit-call value was applied to provide a minimum baseline performance estimate using balanced accuracy (BA) as the metric. Using a stratified 10-fold CV, the mean balanced accuracy was determined to be 50%.

The mean balanced accuracy and standard deviation (std) of a 10-fold stratified CV using an initial Random Forest Classifier (RFC) with 100 estimators (trees) for the test set was 0.71 and 0.02. In contrast, the BA mean and std of the 10-fold CV of the training set was 0.94 and 0.0025 (see Figure 7).

Figure 7.

Figure 7.

CV Balanced Accuracy scores for initial baseline Random Forest model

Computing the validation curve revealed that adding more estimators did not reduce the difference between the mean BA of the training and test sets nor resulted in any appreciable improvement in performance. The mean performance plateaued after 50 estimators (Figure 8).

Figure 8.

Figure 8.

Validation curve for baseline random forest classifier

Other base classifiers were then investigated including Support Vector Machines (linear and radial), Gaussian NB, RidgeClassifier, Neural Networks, Linear Discriminant Analysis and KNN (with n = 5, 10, 20) and Random Forest. Figure 9 depicts the mean BA for each model using a 10-fold stratified CV procedure. Table 3 shows the means of the BA, F1, MCC and Recall values from the CV procedure.

Figure 9:

Figure 9:

Mean 10-fold CV BA for the nine models evaluated

Table 3:

Selected mean performance metrics from the 10-fold stratified CV of other machine learning approaches investigated

model cv_BA_mean cv_F1_mean cv_MCC_mean cv_Recall_mean
LinearSVC 0.673 0.528 0.399 0.44
SVC 0.696 0.566 0.473 0.451
GNB 0.679 0.552 0.327 0.703
MLP 0.699 0.571 0.4 0.567
LDA 0.68 0.539 0.405 0.461
Ridge 0.663 0.507 0.393 0.404
knn-5 0.674 0.527 0.415 0.426
knn-10 0.634 0.438 0.376 0.308
knn-20 0.609 0.377 0.337 0.251
RF 0.713 0.595 0.477 0.51

From Figure 9, the RFC appeared most promising in terms of its BA performance though there was little difference between the models aside from the KNN with a large number of neighbours performing the worst. On the basis of the mean MCC values, the RFC appeared to perform better (0.477) than all the other models.

Hyperparameter tuning was then performed using a nested CV procedure. The number of trees and maximum leaf nodes were tuned using a grid search 10-fold stratified CV procedure. The number of trees varied from 1 to 200 whereas number of leaf nodes varied from 2 to 50. The mean CV BA derived was 0.645 and the std was 0.0147.

The mean results of the inner CV for each model of the outer CV for each parameter combination was explored in a boxplot (Figure 10) to help identify the best parameters for the final RFC model.

Figure 10:

Figure 10:

Inner CV BA results with parameter (number of trees, max leaf nodes) combinations

The best parameters identified through the grid search was to have 50 max leaf nodes and 50 estimators. This is consistent with the inner CV BA results shown in Figure 10.

One benefit of the RFC is that it is possible to uncover which TxPs in the model have the largest importance. Inspection of single trees do show that these features are often at the root level split of the constituent decision trees (see Figure S3 in the supplementary as an example). Figure 11 shows the top 10 TxPs in the RFC model developed.

Figure 11:

Figure 11:

Top 10 most important features in the RFC model

Of the 159 features that are required for 95% of cumulative importance, 28 of these overlapped with the enriched TxPs discussed in Section 4.3. Within the top 10 shown in Figure 11, there is overlap between the features themselves – substances with the bond:COH_alcohol_aromatic_phenol TxP will also contain the ring:aromatic benzene TxP. There is an inherent hierarchy within the ToxPrints such that features are not entirely independent from one other. This is evident with bond:CB_amine_pri-NH2_aromatic being a more specific TxP of the more generic bond:CN_amine_pro-NH2_generic or bond:N(=O)_nitro_C and bond:N(=O)_nitro_aromatic.

The RFC was refit to the training set using 50 estimators and 50 max leaf nodes. The performance metrics for the hold out (validation set) that was not involved in the training/testing are provided in Table 4.

Table 4:

Performance metrics for the hold out set

Metric Hold out set
BA 0.648
MCC 0.387
Sensitivity (recall) 0.35
Precision 0.725
Specificity 0.946

4.6. Derivation of applicability domain

A leverage score was used to define the global applicability domain. For the hold out validation set – 63 substances (4.2%) were outside of the applicability domain as defined by the leverage score. These are provided in the supplementary data files (see https://doi.org/10.23645/epacomptox.22587706).

A local applicability domain was determined by calculating the pairwise distances of the 5 nearest neighbours of each of the training set substances. The mean of these values was computed and the distribution of the mean values, plotted as an experimental cumulative distribution function (ECDF) (Figure S4). The 95% percentile was computed to be 0.514 and taken as the local AD threshold. For the validation set substances, the five closest neighbours from the training set were identified and the mean of their pairwise distances computed. If the mean of the validation set pairwise distances exceeded the AD threshold, the validation substance was denoted as out of domain of the model. For the 1,507 substances in the validation set, 138 substances (9.2%) were found to be out of domain (see supplementary information).

The percentage of positive substances that were outside of domain based on global and local approaches was 34.1%. The substances that were determined to be outside of the domain from the two approaches were largely different with only 6 substances that were in common between both approaches defining the AD.

Inspecting the incorrect model predictions (cf. experimental data) relative to the AD found no apparent relationship between a substance being out of domain based on either approach and its prediction. Frequency plots were constructed comparing the extent to which predictions corresponded with experimental outcomes and overlaid by whether predictions were in or out of domain (see supplementary Figures S5 and S6).

4.7. Application to the TSCA Active inventory (non-confidential).

TxPs could be generated for 14,067 of the TSCA active non confidential list that is hosted on the public CompTox Chemicals Dashboard. Predictions were made for this set of substances using the final RF model to determine how many substances were more likely to give rise to thiol reactivity. 11.5% of the substances were predicted to be reactive. The global and local applicability thresholds were applied to the set of chemicals. A small proportion of substances (~600 substances) fell outside of the domain of applicability as shown in the frequency plot in Figure 12.

Figure 12.

Figure 12.

Count plot of the MSTI predictions from the RF model contrasted with the counts based on applicability information. The frequency counts in blue show the number of substances that tested positive (reactive) or negative (inactive) in the MSTI assay. The staggered bars in orange and green highlight of all the active or inactive substances, how many were out of the domain based on either the leverage technique (global) or the nearest neighbour (local) domain. The intent is to provide a perspective of the scale of substances that fell out of domain by both techniques relative to the number of substances profiled for their likely MSTI outcome.

The enriched TxPs derived in Section 3.5 were also used as a crude screen to identify potential reactive substances on account of substances containing any one of the enriched TxPs. There were 2,903 substances that contained an enriched TxP. The enriched TxPs flagged many more substances as potentially reactive (low precision relative to the RF model) but the overall agreement between the two schemes was high (balanced accuracy 0.9, Table 5).

Table 5:

Confusion matrix for RF model predictions vs. Enriched ToxPrints

ToxPrint_enriched 0 1
RF model prediction
0 11025 1,414
1 139 1,489

In practice, a combination of the enriched TxPs as flags and the RF model could be used to triage substances as potential thiol reacting substances. Here, a sum of the predictions from both qualitative and quantitative approaches (maximum score of 2) was taken and downweighted by a factor of 0.5 if a substance was found to be outside of the AD. Figure 13 shows the frequency of the overall calls by DTXSID based on this proposed scoring scheme.

Figure 13.

Figure 13.

Count plot for the final MSTI outcomes predicted based on a combination of RF and enriched TxPs accounting for AD information.

5. Conclusions

Given the shortcoming of current Tox21 assays to assess reactivity, a thiol-based assay as described by McCallum et al. [23] was used to measure reactivity potential for the Tox21 10K chemical library. An investigation of the feasibility of deriving qualitative and quantitative structural insights was undertaken for the dataset. Firstly, the landscape of the Tox21 chemical inventory was visualised relative to hit-call outcomes from the MSTI to explore to what extent there were obvious clusters of chemicals that were active or not. TxPs were computed for the substances and odds ratios calculated to understand where there were certain structural features that were overrepresented in the tested active space that could serve as preliminary structural alert indicators. Forty-seven TxPs were identified that were particularly enriched. These were rationalised by chemical inspection in light of known reaction chemistry as far as possible. A comparison with structural alerts for reactivity as evaluated in a previous study by Nelms et al. [24] was performed; however, no correspondence was observed, likely because the reactivity alerts in Nelms et al. [24] largely originated from an evaluation of chemicals tested for their in vivo skin sensitisation potential. These skin sensitisation datasets are far less diverse in their chemistry and much smaller in size relative to the Tox21 chemical library. Moreover it is worth noting that the comparison of MSTI data that has no biotransformation/bioactivation potential was being made with alerts derived from in vivo data.

A selection of different machine learning approaches was used to investigate the feasibility of using TxPs as descriptors in a predictive model. The RF model was carried forward due to its slightly higher balanced accuracy and interpretability in terms of being able to extract feature-importance information. Following hyperparameter optimisation, the RF model was rebuilt for the same set of substances and applied to a reserved set, giving rise the BA of 0.648. An assessment of the applicability domain using two approaches – one global and one local - was used to provide additional information to assist in the evaluation of model predictions.

The final model was then applied to the TSCA Active non-confidential inventory. Predictions from the model were complemented by using the enriched TxPs as a SAR model. The sum of the predictions adjusted based on AD information resulted in final outcomes for the TSCA active set.

The combined application of these two structure-activity models developed shows promise as a means to triage untested chemicals for their potential electrophilic reactivity on the basis of chemical structural features. It is recognised that the approach undertaken of using enriched TxPs as crude SARs in conjunction with a RF model represents only one potential scheme and could be refined further. Indeed, there have been many virtual screening efforts published (particularly in the drug discovery domain – see reviews [57,58]) which vary in the types of descriptors used (such as topological descriptors e.g. TOPological Sub-Structural Molecular Design (TOPS-MODE) [59] to graph based encodings [60]) as well as the level of complexity in modelling approach e.g. Classification and Regression Trees (CART) classifiers that lend themselves to the development of SARs [61] to Deep learning approaches encompassing multitask modelling (see reviews [62, 63]).

The chemical interpretability of the TxPs which could be rationalised relative to known alerts (as noted in Table 1) provided strong motivation to use TxPs in the model building phase. Another descriptor set, namely the full set of RDKit descriptors were also investigated but the mean cross validated BA performance was found to be very similar to what was derived using TxPs (results not shown) and this line of investigation was not pursued further.

The triage reported herein may be important in terms of selecting substances for further screening in bioactivity assays of increasing complexity or for contextualising the results of such assays for next generation risk assessments.

Supplementary Material

SI

Highlights.

  • Screened the Tox21 10K library using the MSTI assay to identify thiol-reactive compounds

  • Calculated ToxPrint enrichments and compared these to known structural alerts

  • Developed a Random Forest classification model to predict MSTI assay outcomes from structure

  • Applied the model and enriched ToxPrints to an inventory of regulatory interest

Acknowledgments

This study was supported in part by the Intramural Research Program of the National Center for Advancing Translational Sciences (NCATS), and Interagency Agreement IAA #NTR 12003 from the National Institute of Environmental Health Sciences/Division of the National Toxicology Program to the NCATS, by the National Institutes of Health and by appropriated funds of the US Environmental Protection Agency. We would like to thank Anton Simeonov for critical advice in performing the MSTI assay, Adam Yasgar for technical advice on MSTI assay, Jameson Travers and Carleen Klumpp-Thomas for online screening, Paul Shinn for compound management and EPA technical reviewers Chad Deisenroth and Nathaniel Charest for their comments. We also thank the anonymous reviewers for their valuable comments in improving the manuscript.

Footnotes

Disclaimer

The views expressed in this article are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency and National Center for Advancing Translational Sciences, National Institutes of Health. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.

Conflict of Interest Statement

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of this article.

References

  • [1].Huang R, A Quantitative High-Throughput Screening Data Analysis Pipeline for Activity Profiling, Methods Mol Biol 1473 (2016) 111–122. 10.1007/978-1-4939-6346-1_12. [DOI] [PubMed] [Google Scholar]
  • [2].Thomas RS, Paules RS, Simeonov A, Fitzpatrick SC, Crofton KM, Casey WM, Mendrick DL, The US Federal Tox21 Program: A strategic and operational plan for continued leadership, ALTEX 35 (2018) 163–168. 10.14573/altex.1803011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Richard AM, Huang R, Waidyanatha S, Shinn P, Collins BJ, Thillainadarajah I, Grulke CM, Williams AJ, Lougee RR, Judson RS, Houck KA, Shobair M, Yang C, Rathman JF, Yasgar A, Fitzpatrick SC, Simeonov A, Thomas RS, Crofton KM, Paules RS, Bucher JR, Austin CP, Kavlock RJ, Tice RR, The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology, Chem. Res. Toxicol 34 (2021) 189–216. 10.1021/acs.chemrestox.0c00264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Capuzzi SJ, Politi R, Isayev O, Farag S, Tropsha A, QSAR Modeling of Tox21 Challenge Stress Response and Nuclear Receptor Signaling Toxicity Assays, Frontiers in Environmental Science 4 (2016). 10.3389/fenvs.2016.00003 (accessed January 23, 2023). [DOI] [Google Scholar]
  • [5].Huang R, Xia M, Nguyen D-T, Zhao T, Sakamuru S, Zhao J, Shahane SA, Rossoshek A, Simeonov A, Tox21Challenge to Build Predictive Models of Nuclear Receptor and Stress Response Pathways as Mediated by Exposure to Environmental Chemicals and Drugs, Frontiers in Environmental Science 3 (2016). 10.3389/fenvs.2015.00085 (accessed January 23, 2023). [DOI] [Google Scholar]
  • [6].Idakwo G, Thangapandian S, Luttrell J, Li Y, Wang N, Zhou Z, Hong H, Yang B, Zhang C, Gong P, Structure–activity relationship-based chemical classification of highly imbalanced Tox21 datasets, Journal of Cheminformatics 12 (2020) 66. 10.1186/s13321-020-00468-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Wu L, Huang R, Tetko IV, Xia Z, Xu J, Tong W, Trade-off Predictivity and Explainability for Machine-Learning Powered Predictive Toxicology: An in-Depth Investigation with Tox21 Data Sets, Chem. Res. Toxicol 34 (2021) 541–549. 10.1021/acs.chemrestox.0c00373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Wang MWH, Goodman JM, Allen TEH, Machine Learning in Predictive Toxicology: Recent Applications and Future Directions for Classification Models, Chem. Res. Toxicol 34 (2021) 217–239. 10.1021/acs.chemrestox.0c00316. [DOI] [PubMed] [Google Scholar]
  • [9].Cavasotto CN, Scardino V, Machine Learning Toxicity Prediction: Latest Advances by Toxicity End Point, ACS Omega 7 (2022) 47536–47546. 10.1021/acsomega.2c05693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Borrel A, Huang R, Sakamuru S, Xia M, Simeonov A, Mansouri K, Houck KA, Judson RS, Kleinstreuer NC, High-Throughput Screening to Predict Chemical-Assay Interference, Sci Rep 10 (2020) 3986. 10.1038/s41598-020-60747-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Dahlin JL, Nissink JWM, Strasser JM, Francis S, Higgins L, Zhou H, Zhang Z, Walters MA, PAINS in the Assay: Chemical Mechanisms of Assay Interference and Promiscuous Enzymatic Inhibition Observed during a Sulfhydryl-Scavenging HTS, J Med Chem 58 (2015) 2091–2113. 10.1021/jm5019093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Bajorath J, Evolution of assay interference concepts in drug discovery, Expert Opin Drug Discov 16 (2021) 719–721. 10.1080/17460441.2021.1902983. [DOI] [PubMed] [Google Scholar]
  • [13].Judson RS, Magpantay FM, Chickarmane V, Haskell C, Tania N, Taylor J, Xia M, Huang R, Rotroff DM, Filer DL, Houck KA, Martin MT, Sipes N, Richard AM, Mansouri K, Setzer RW, Knudsen TB, Crofton KM, Thomas RS, Integrated Model of Chemical Perturbations of a Biological Pathway Using 18 In Vitro High-Throughput Screening Assays for the Estrogen Receptor, Toxicol Sci 148 (2015) 137–154. 10.1093/toxsci/kfv168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Li Z, Lam YW, Liu Q, Lau AYK, Yu Au-Yeung H, Chan RHM, Machine Learning-Driven Drug Discovery: Prediction of Structure-Cytotoxicity Correlation Leads to Identification of Potential Anti-Leukemia Compounds, Annu Int Conf IEEE Eng Med Biol Soc 2020 (2020) 5464–5467. 10.1109/EMBC44109.2020.9175850. [DOI] [PubMed] [Google Scholar]
  • [15].Webel HE, Kimber TB, Radetzki S, Neuenschwander M, Nazaré M, Volkamer A, Revealing cytotoxic substructures in molecules using deep learning, J Comput Aided Mol Des 34 (2020) 731–746. 10.1007/s10822-020-00310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Schultz TW, Carlson RE, Cronin MTD, Hermens JLM, Johnson R, O’Brien PJ, Roberts DW, Siraki A, Wallace KB, Veith GD, A conceptual framework for predicting the toxicity of reactive chemicals: modeling soft electrophilicity, SAR and QSAR in Environmental Research 17 (2006) 413–428. 10.1080/10629360600884371. [DOI] [PubMed] [Google Scholar]
  • [17].Ankley GT, Bennett RS, Erickson RJ, Hoff DJ, Hornung MW, Johnson RD, Mount DR, Nichols JW, Russom CL, Schmieder PK, Serrrano JA, Tietge JE, Villeneuve DL, Adverse outcome pathways: a conceptual framework to support ecotoxicology research and risk assessment, Environ Toxicol Chem 29 (2010) 730–741. 10.1002/etc.34. [DOI] [PubMed] [Google Scholar]
  • [18].Schwöbel JAH, Koleva YK, Enoch SJ, Bajot F, Hewitt M, Madden JC, Roberts DW, Schultz TW, Cronin MTD, Measurement and Estimation of Electrophilic Reactivity for Predictive Toxicology, Chem. Rev 111 (2011) 2562–2596. 10.1021/cr100098n. [DOI] [PubMed] [Google Scholar]
  • [19].Roberts DW, Aptula AO, Determinants of skin sensitisation potential, J Appl Toxicol 28 (2008) 377–387. 10.1002/jat.1289. [DOI] [PubMed] [Google Scholar]
  • [20].Aptula AO, Roberts DW, Mechanistic applicability domains for nonanimal-based prediction of toxicological end points: general principles and application to reactive toxicity, Chem Res Toxicol 19 (2006) 1097–1105. 10.1021/tx0601004. [DOI] [PubMed] [Google Scholar]
  • [21].Cronin MTD, Bajot F, Enoch SJ, Madden JC, Roberts DW, Schwöbel J, The In Chemico–In Silico Interface: Challenges for Integrating Experimental and Computational Chemistry to Identify Toxicity, Alternatives to Laboratory Animals 37 (2009) 513. [DOI] [PubMed] [Google Scholar]
  • [22].LoPachin RM, Geohagen BC, Nordstroem LU, Mechanisms of Soft and Hard Electrophile Toxicities, Toxicology 418 (2019) 62–69. 10.1016/j.tox.2019.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].McCallum MM, Nandhikonda P, Temmer JJ, Eyermann C, Simeonov A, Jadhav A, Yasgar A, Maloney D, Arnold AL, High-throughput identification of promiscuous inhibitors from screening libraries with the use of a thiol-containing fluorescent probe, J Biomol Screen 18 (2013) 705–713. 10.1177/1087057113476090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Nelms MD, Lougee R, Roberts DW, Richard A, Patlewicz G, Comparing and contrasting the coverage of publicly available structural alerts for protein binding, Computational Toxicology 12 (2019) 100100. 10.1016/j.comtox.2019.100100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Patlewicz G, Casati S, Basketter DA, Asturiol D, Roberts DW, Lepoittevin J-P, Worth AP, Aschberger K, Can currently available non-animal methods detect pre and pro-haptens relevant for skin sensitization?, Regul Toxicol Pharmacol 82 (2016) 147–155. 10.1016/j.yrtph.2016.08.007. [DOI] [PubMed] [Google Scholar]
  • [26].Filer DL, Kothiya P, Setzer RW, Judson RS, Martin MT, tcpl: the ToxCast pipeline for high-throughput screening data, Bioinformatics 33 (2017) 618–620. 10.1093/bioinformatics/btw680. [DOI] [PubMed] [Google Scholar]
  • [27].Perry M, Kader G, Variation as Unalikeability, Teaching Statistics 27 (2005) 58–60. 10.1111/j.1467-9639.2005.00210.x. [DOI] [Google Scholar]
  • [28].Grulke CM, Williams AJ, Thillanadarajah I, Richard AM, EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research, Comput Toxicol 12 (2019). 10.1016/j.comtox.2019.100096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, Patlewicz G, Shah I, Wambaugh JF, Judson RS, Richard AM, The CompTox Chemistry Dashboard: a community data resource for environmental chemistry, J Cheminform 9 (2017) 61. 10.1186/s13321-017-0247-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].van der Maaten L, Hinton G, Visualizing Data using t-SNE, Journal of Machine Learning Research 9 (2008) 2579–2605. [Google Scholar]
  • [31].Bishop CM, Svensén M, Williams CKI, GTM: The Generative Topographic Mapping, Neural Computation 10 (1998) 215–234. 10.1162/089976698300017953. [DOI] [Google Scholar]
  • [32].Yang C, Tarkhov A, Marusczyk J, Bienfait B, Gasteiger J, Kleinoeder T, Magdziarz T, Sacher O, Schwab CH, Schwoebel J, Terfloth L, Arvidson K, Richard A, Worth A, Rathman J, New publicly available chemical query language, CSRML, to support chemotype representations for application to data mining and modeling, J Chem Inf Model 55 (2015) 510–528. 10.1021/ci500667v. [DOI] [PubMed] [Google Scholar]
  • [33].Wang J, Hallinger DR, Murr AS, Buckalew AR, Lougee RR, Richard AM, Laws SC, Stoker TE, High-throughput screening and chemotype-enrichment analysis of ToxCast phase II chemicals evaluated for human sodium-iodide symporter (NIS) inhibition, Environ Int 126 (2019) 377–386. 10.1016/j.envint.2019.02.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Wang J, Richard AM, Murr AS, Buckalew AR, Lougee RR, Shobair M, Hallinger DR, Laws SC, Stoker TE, Expanded high-throughput screening and chemotype-enrichment analysis of the phase II: e1k ToxCast library for human sodium-iodide symporter (NIS) inhibition, Arch Toxicol 95 (2021) 1723–1737. 10.1007/s00204-021-03006-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Benjamini Y, Hochberg Y, Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, Journal of the Royal Statistical Society: Series B (Methodological) 57 (1995) 289–300. 10.1111/j.2517-6161.1995.tb02031.x. [DOI] [Google Scholar]
  • [36].Liu J, Patlewicz G, Williams AJ, Thomas RS, Shah I, Predicting Organ Toxicity Using in Vitro Bioactivity Data and Chemical Structure, Chem Res Toxicol 30 (2017) 2046–2059. 10.1021/acs.chemrestox.7b00084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Pradeep P, Carlson LM, Judson R, Lehmann GM, Patlewicz G, Integrating data gap filling techniques: A case study predicting TEFs for neurotoxicity TEQs to facilitate the hazard assessment of polychlorinated biphenyls, Regul Toxicol Pharmacol 101 (2019) 12–23. 10.1016/j.yrtph.2018.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É, Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res 12 (2011) 2825–2830. [Google Scholar]
  • [39].Müller A, Guido S, Introduction to Machine Learning with Python: A Guide for Data Scientists, 1st edition, O’Reilly Media, Sebastopol, CA, 2016. [Google Scholar]
  • [40].Géron A, Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 1st edition, O’Reilly Media, Beijing ; Boston, 2017. [Google Scholar]
  • [41].Puzyn T, Mostrag-Szlichtyng A, Gajewicz A, Skrzyński M, Worth AP. Investigating the Influence of Data Splitting on the Predictive Ability of QSAR/QSPR Models. Struct. Chem 22 (2011) 795–804. doi: 10.1007/s11224-011-9757-4 [DOI] [Google Scholar]
  • [42].van Tilborg D, Alenicheva A, Grisoni F. Exposing the limitations of molecular machine learning with activity cliffs. J.Chem. Inf. Model 62 (2022) 5938–5951. doi: 10.1021/acs.jcim.2c01073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Halder AK, Giri AK, Cordeiro MNDS. Multi-Target chemometric modelling, fragment analysis and virtual screening with erk inhibitors as potential anticancer agents. Molecules 24 (2019) 3909. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Kennard RW, Stone LA. Computer Aided Design of Experiments. Technometrics 11 (1969) 137–148. [Google Scholar]
  • [45].Gramatica P, Principles of QSAR models validation: internal and external, QSAR & Combinatorial Science 26 (2007) 694–701. 10.1002/qsar.200610151. [DOI] [Google Scholar]
  • [46].Rakhimbekova A, Madzhidov TI, Nugmanov RI, Gimadiev TR, Baskin II, Varnek A, Comprehensive Analysis of Applicability Domains of QSPR Models for Chemical Reactions, International Journal of Molecular Sciences 21 (2020) 5542. 10.3390/ijms21155542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [47].Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R, Picus M, Hoyer S, van Kerkwijk MH, Brett M, Haldane A, del Río JF, Wiebe M, Peterson P, Gérard-Marchant P, Sheppard K, Reddy T, Weckesser W, Abbasi H, Gohlke C, Oliphant TE, Array programming with NumPy, Nature 585 (2020) 357–362. 10.1038/s41586-020-2649-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [48].Reback J, McKinney W, jbrockmendel JV den Bossche T. Augspurger, gfyoung P. Cloud, Sinhrks S. Hawkins, Roeschke M, Klein A, Petersen T, Tratner J, She C, Ayd W, Naveh S, Garcia M, Schendel J, Hayden A, Saxton patrick D, Jancauskas V, McMaster A, Battiston P, Seabold S, Gorelli M, Dong K, chris-b1, h-vetinari S. Hoyer, pandas-dev/pandas: Pandas 1.2.1, (2021). 10.5281/zenodo.4452601. [DOI]
  • [49].Hunter JD, Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering 9 (2007) 90–95. 10.1109/MCSE.2007.55. [DOI] [Google Scholar]
  • [50].Waskom ML, seaborn: statistical data visualization, Journal of Open Source Software 6 (2021) 3021. 10.21105/joss.03021. [DOI] [Google Scholar]
  • [51].Landrum G, RDKit: Open-source cheminformatics; http://www.rdkit.org, (n.d.).
  • [52].Gaspar HA, ugtm: A Python Package for Data Modeling and Visualization Using Generative Topographic Mapping, Journal of Open Research Software 6 (2018) 26. 10.5334/jors.235. [DOI] [Google Scholar]
  • [53].Mielke PW, Berry KJ, Description of MRPP, in: Mielke PW, Berry KJ (Eds.), Permutation Methods: A Distance Function Approach, Springer, New York, NY, 2001: pp. 9–65. 10.1007/978-1-4757-3449-2_2. [DOI] [Google Scholar]
  • [54].Benigni R, Giuliani A, Franke R, Gruska A, Quantitative Structure−Activity Relationships of Mutagenic and Carcinogenic Aromatic Amines, Chem. Rev 100 (2000) 3697–3714. 10.1021/cr9901079. [DOI] [PubMed] [Google Scholar]
  • [55].Benigni R, Bossa C, Structure alerts for carcinogenicity, and the Salmonella assay system: a novel insight through the chemical relational databases technology, Mutat Res 659 (2008) 248–261. 10.1016/j.mrrev.2008.05.003. [DOI] [PubMed] [Google Scholar]
  • [56].Roberts DW, Aptula AO, Patlewicz G, Electrophilic chemistry related to skin sensitization. Reaction mechanistic applicability domain classification for a published data set of 106 chemicals tested in the mouse local lymph node assay, Chem Res Toxicol 20 (2007) 44–60. 10.1021/tx060121y. [DOI] [PubMed] [Google Scholar]
  • [57].Jimenez-Luna J, Grisoni F, Schneider G. Drug discovery with explainable artificial intelligence. Nature Machine Intelligence 2 (2020) 573–584. [Google Scholar]
  • [58].Gimeno A, Ojeda-Montes MJ, Tomás-Hernández S, Cereto-Massagué A, Beltrán-Debón R, Mulero M, Pujadas G, Garcia-Vallvé S. The Light and Dark Sides of Virtual Screening: What Is There to Know? Int J Mol Sci 20 (2019) 1375. doi: 10.3390/ijms20061375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [59].Helguera AM, Gonzalez MP, Briones JR. TOPS-MODE approach to predict mutagenicity in dental monomers. Polymer 45 (2004) 2045–2050. [Google Scholar]
  • [60].Karpov P, Godin G, Tetko IV. Transformer-CNN: Swiss knife for QSAR modelling and interpretation. J. Cheminformatics 12 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [61].Cruz-Monteagudo M, Dias Soeiro Cordeiro MN. Chemoinformatics Profiling of Ionic Liquids-Uncovering Structure-Cytotoxicity Relationships with Network-like Similarity Graphs. Tox. Sci 138 (2014) 191–204. [DOI] [PubMed] [Google Scholar]
  • [62].Sosnin S, Karlov D, Tetko IV, Fedorov MV. Comparative study of Multitask toxicity modelling on a broad chemical space. J. Chem Inf. Model 59 (2019) 1062–1072. [DOI] [PubMed] [Google Scholar]
  • [63].Halder AK, Moura AS, Cordeiro N. Moving Average-Based MultiTasking in Silico Classification Modeling: Where do we stand and what is next? In.J Mol Sci 23 (2022) 4937 doi: 10.3390/lijms23094937 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

SI

RESOURCES