Comparing LD50/LC50 Machine Learning Models for Multiple Species

Thomas R Lane; Joshua Harris; Fabio Urbina; Sean Ekins

doi:10.1021/acs.chas.2c00088

. Author manuscript; available in PMC: 2024 Mar 27.

Published in final edited form as: J Chem Health Saf. 2023 Feb 23;30(2):83–97. doi: 10.1021/acs.chas.2c00088

Comparing LD₅₀/LC₅₀ Machine Learning Models for Multiple Species

Thomas R Lane ^1,^*, Joshua Harris ¹, Fabio Urbina ¹, Sean Ekins ^1,^*

PMCID: PMC10348353 NIHMSID: NIHMS1904611 PMID: 37457397

Abstract

The lethal dose or concentration which kills 50% of the animals (LD₅₀ or LC₅₀) is an important parameter for scientists to understand the toxicity of chemicals in different scenarios that can be used to make go-no-go decisions, and ultimately assist in the choice of the right personal protective equipment needed for containment. The LD₅₀ assessment process has also required the use of many animals although modern methods have reduced the number of rats needed. Since a compound is usually considered highly toxic when the LD₅₀ is lower than 25 mg/kg, such a classification provides potentially valuable safety information to synthetic chemists and other safety assessment scientists. The need for finding alternative approaches such as computational methods is important to ultimately reduce animal use for this testing further still. We now summarize our efforts to use public data for building in vivo LD₅₀ or LC₅₀ classification and regression machine learning models for various species (rat, mouse, fish and daphnia) and their 5-fold cross validation statistics with different machine learning algorithms as well as an external curated test set for mouse LD₅₀. These datasets consist of different molecule classes, may cover different activity ranges, and also have a range of dataset sizes. The challenges of using such computational models are that their applicability domain will also need to be understood so that they can be used to make reliable predictions for novel molecules. These machine learning models will also need to be backed up with experimental validation. However, such models could also be used for efforts to bridge gaps in individual toxicity datasets. Making such models available also opens them up to potential misuse or dual use. We will summarize these efforts and propose that they could be used for scoring the millions of commercially available molecules, most of which likely do not have a known LD₅₀ or for that matter any data in vitro or in vivo for toxicity.

Keywords: Acute toxicity, Classification, Dual use, in silico predictions, LD₅₀, Machine learning, Regression

Graphical Abstract

graphic file with name nihms-1904611-f0001.jpg

INTRODUCTION

The determination of whether a new molecule may be toxic to humans or for that matter another species is an important question for risk assessment across many industries. And yet it is likely that only a very small fraction of the millions of currently available molecules have either the in vitro or in vivo toxicity determined and then the data made available. Such toxicity is determined as lethal dose (LD₅₀) or lethal concentration (LC₅₀)¹ and this is used to assess and communicate the acute toxicity of a molecule². The assessment of LD₅₀ in the past required the use of very large numbers of animals and led to extensive guidelines for testing chemicals from the Organization for Economic Cooperation and Development (OECD)³. Several modern methods for LD₅₀ determination have focused on rat and generally use significantly fewer animals². Acute toxicity occurs following a short exposure to a substance and adverse effects such as impairment or biochemical lesions affecting the whole organism may follow within 24 hours⁴. Depending on the compound class there is poor correlation between in vitro assays for toxicity or enzyme inhibition and acute oral toxicity due to toxicokinetics⁵. In 2021 the European parliament voted to phase out animal use in testing and research putting more emphasis on the further development of new approach methodologies (NAMs) including in vitro and computational methods which must be validated before adoption⁶.

It is important that such computational models comply with OECD guidelines for quantitative structure activity relationships (QSAR) tools⁷ such that any model used for regulatory purposes needs to have:

a defined endpoint;
an unambiguous algorithm;
a defined domain of applicability;
appropriate measures of goodness-of-fit, robustness and predictivity;
a mechanistic interpretation, if possible.

The computational methods that have been most widely used to meet regulatory requirements when assessing industrial chemicals and pesticides include QSAR and read across (including OECD-Toolbox, OASIS and Derek)⁸.

Numerous computational or in silico models have been developed to predict acute toxicity using the previously generated in vivo data and these are considered acceptable by many researchers and regulatory agencies^{1, 9–25}. A recently curated dataset for rat acute oral toxicity from NTP Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM) and EPA National Center for Computational Toxicology (NCCT)^{26, 27} has been used to generate machine learning models by various groups with different machine learning algorithms and molecular descriptors^{28, 29}. This facilitated the Collaborative Acute Toxicity Modeling Suite (CATMoS) representing the generated consensus predictions from each method used from different groups, which leverages the collective strengths of each individual model used. CATMoS demonstrated high performance in terms of accuracy and robustness when compared with in vivo results^{26, 27}. The CATMoS models have also been tested in the pharmaceutical industry to assess 371 Bristol Myers Squibb compounds separating molecules with undesirable LD₅₀ values (LD₅₀ > 300 mg/kg) from those with low acute oral toxicity (LD₅₀ > 2000 mg/kg)³⁰.

Our group has participated in this collaboration and initially used the CATMoS dataset for generating binary Bayesian classification models³¹ with our first version of our in-house software called Assay Central^® ^32–34. We generated classification models as most of the literature provided an upper limit for the dose as 2000 or 5000mg/kg. Eight models were built based on different categories and models were tested using a validation dataset as an additional published dataset et al.³⁵. Acute oral toxicity has been modeled extensively using many different approaches^{16, 21, 35–42}. Classification models^{28, 29, 43} have been widely used and each research group has used different approaches for assessing their models. For our initial efforts using rat acute oral toxicity with the Bayesian algorithm and extended connectivity fingerprint 6 (ECFP6) descriptors we used balanced accuracy⁴⁴ and this was between 0.61–0.84 for external performance of our eight models. Additional machine learning models (naïve Bayesian, AdaBoosted decision trees, random forest, k-Nearest Neighbors, support vector classification, and deep learning methods) were also compared using five-fold cross-validation metrics for this dataset⁴⁴.

Machine learning models can therefore enable prioritization of in vivo testing for acute oral toxicity in rat and likely other species enabling the reduction in animals required for testing as well as making it more cost effective. While there has been a considerable focus on models for the rat acute oral toxicity likely due to the large amount of data available, there have been few computational models described for other important toxicological species. For this reason, we now describe the curation of in vivo LD₅₀ or LC₅₀ data for mouse, fish, daphnia as well as for the rat which has enabled building classification and regression machine learning models and performing 5-fold cross validation. In addition, the curation of such datasets has enabled correlations for different several species and datasets which goes beyond the recently described comparisons of LD₅₀ values in mice and rats for various exposure routes⁴⁵.

EXPERIMENTAL SECTION

Data curation

Datasets for each species were downloaded from various databases as noted. Entries without a numerical value in the “LC₅₀” or “LD₅₀” value column were removed. In some cases, specific dose routes for administration were also selected. Each dataset was sanitized using our proprietary software “E-Clean” which uses open-source RD-Kit tools in order to remove duplicate compounds, salts as well as neutralize charges. For regression models for mouse and rat toxicity, the LD₅₀ values were converted to −log[mg/kg], and then averaged for duplicate compounds prior to model building. For the aquatic toxicity classification models, all but the lowest values were discarded (most sensitive assay/species retained), and compound classification was based on these values. Any values with qualifiers of > near the high toxicity threshold (1mg/L) were discarded as these were ambiguous. An additional step was taken for freshwater fish datasets that were used to build regression models, where if multiple experiments were performed for the same compound/species the geometric mean of the LD₅₀ was calculated. These averaged values were then used to select the most sensitive species. This was done to remove outliers/incorrect data that may have been present. As daphnid LD₅₀ can be calculated using multiple metrics, no averaging was done for these datasets.

For the classification models for aquatic toxicity, the binarized value was based on a threshold for high/low toxicity (≤1 and ≥100 mg/L) of the most sensitive species. When datasets were combined from different sources, the lowest LD₅₀ value was used to classify. Datasets were further standardized within the latest version of the Assay Central software which uses the Indigo Toolkit⁴⁶.

Mouse

Mouse datasets were downloaded from ChEMBL (CHEMBL375) with the LD₅₀ associated bioactivities filter (Table 1). Only data for which the route of administration was able to be confirmed was retained. Data was independently curated for intravenous (IV), subcutaneous (SC), intramuscular (IM), oral gavage (oral) and intraperitoneal (IP) administration.

Table 1.

Summary of LD₅₀ and LC₅₀ datasets used in this study.

Endpoint	Animal Model	Source	Route of Administration	High (≤1 mg/L)/≤EPA II	Low (≥100 mg/L)/>EPA II	Total Compounds
LC₅₀	Fish	ECOTOX	N/A	613	426	1986
		ECOTOX + MOE of Japan	N/A	820	624	2823
		⁵²	N/A	678	534	2303
		ECOTOX + MOE of Japan +⁵²	N/A	880	664	2983
	Daphnid	ECOTOX	N/A	229	82	572
		ECOTOX +⁵³	N/A	123	428	932
		ECOTOX +⁵³ + MOE of Japan	N/A	345	484	1377
LD₅₀	Rat (Regression)	²⁷	Oral	N/A	N/A	8,397
	Rat (Classification)	²⁷	Oral	3444	7,853	11,297
	Mouse (Regression)	ChEMBL	IV	N/A	N/A	409
			SC	N/A	N/A	91
			IM	N/A	N/A	126
			IP	N/A	N/A	1,376
			Oral	N/A	N/A	803
	Mouse (Classification)		Oral	330	473	803

Open in a new tab

Rat

Acute rat oral toxicity datasets are from the from publication “CATMoS: Collaborative Acute Toxicity Modeling Suite”²⁷, with their training and evaluation sets combined. For classification modelling, the training set included all compounds with a defined EPA category. Following processing in Assay Central, the dataset had 11,297 compounds with 3444 actives based on an activity threshold of ≤ EPA II. For the continuous models, training sets were comprised of only compounds that had defined LD₅₀ values. After processing, these datasets had 8397 unique compounds with a total range of 0.12 – 71,000 mg/kg.

Fish

The ECOTOX data for fish was originally obtained from the ECOTOX Knowledgebase website⁴⁷. The criteria used to create the output was: group: fish, endpoints: LD₅₀/LC₅₀, observation duration: between 3.7813 – 4.1667 days (~96 hrs). Export was limited to 10,000 entries so the years were broken up into separate files. After export these were all combined and were between the years of 1915–2022. SMILES were then added based on the CAS number. These were found using a batch lookup on the CompTox dashboard from the EPA^{48, 49}. It should be noted that the dashes were removed, so looking up by CAS from PubChem ⁵⁰ was not possible. Compounds for which SMILES were not found by CAS were looked up by the name found via the CompTox dashboard or PubChem. All compounds without available SMILES were removed. SMILES were canonicalized using E-clean and then all but the lowest values were discarded (From ~24,000 to ~2100). The final dataset contained 2118 compounds, though some of these were inorganic compounds. Following the removal of inorganics and duplicates the final dataset had 1986 compounds. Two datasets were built representing high and low/no toxicity: ≤1mg/L and ≥100mg/L, respectively.

We also combined the ECOTOX dataset and the Ecotoxicity Test Database by the Ministry of the Environment of Japan (MOE of Japan), the latter was downloaded from their website⁵¹. In the same manner as the ECOTOX datasets, following the combination of the data all but the lowest value per compound was removed. Following the removal of inorganics and duplicates the final dataset had 2823 compounds. Finally we combined all these datasets, the ECOTOX dataset, the Ecotoxicity Test Database by the Ministry of the Environment of Japan (MOE of Japan) and the recent paper⁵² from the EPA. Following the combination of the curated data from these sources all but the lowest value per compound was retained. After the removal of inorganics and duplicates the final dataset had 2983 compounds. Two datasets were built representing high and low/no toxicity: ≤1mg/L and ≥100mg/L, respectively. The high and low/no toxicity datasets had 880 and 664 actives, respectively.

Additional fish toxicity data was obtained from a recent paper ⁵² from the EPA. This represents an extremely large dataset (~96,000 entries) that required considerable pruning. A similar criterion was used to create the dataset to make it compatible with the ECOSAR criteria. The criteria were: Fish, endpoints: LC₅₀, observation duration: 4 days (96 hrs), and study type: mortality. SMILES were looked up using the DssTox substance ID on the CompTox dashboard from the EPA. Compounds that were not found by DssTox were looked up by name on PubChem. All compounds without available SMILES were removed. Most compounds without SMILES were either mixtures and or commercial products without available structures. SMILES were then canonicalized using E-clean and then all but the lowest values were discarded. Following cleanup in Assay Central this dataset had a total of 2303 compounds. Due to the size of this dataset, it was used to build its own independent model.

Finally we combined all three datasets from ECOTOX, the MOE of Japan and the recent paper ⁵² from the EPA. Following the combination of the curated data from these sources all but the lowest value per compound was retained. After the removal of inorganics and duplicates the final dataset had 2983 compounds. Two datasets were built representing high and low/no toxicity: ≤1mg/L and ≥100mg/L, respectively. The high and low/no toxicity datasets had 880 and 664 actives, respectively.

Daphnid

One of the sources for the Daphnid LC₅₀ data was also the ECOTOX website. The criteria used to create the daphnia dataset from the ECOTOX database was: species: Daphnia magna, endpoints: LC₅₀, observation duration: between 2 – 2.125 days (~48 hrs) and published between the years of 1915–2022. SMILES were then added based on the CAS number. These were found using a batch lookup on the CompTox dashboard from the EPA. It should be noted is that the dashes were removed, so looking them up by CAS from other sources was not possible. Compounds for which SMILES were not found by CAS were looked up by the name found via the CompTox dashboard on PubChem. All compounds without available SMILES were removed. SMILES were canonicalized using our in-house software and then all but the lowest values were discarded. From a recent paper using acute toxicity toward Daphnia magna for machine learning models⁵³ data were in units 1/log₁₀ (mM). To make these compatible MWs were calculated based on the SMILES strings. It should be noted that these SMILES had already had salts removed so mg/L values are from the freebase form of the compound. While this may make the compatibility questionable, it is unavoidable without going back to the original source of the data. The dataset had a total of 440 compounds. Two datasets were built representing high and low/no toxicity: ≤1mg/L and ≥100mg/L, respectively. Unfortunately, this dataset was so unbalanced they were not useful for regression model building. The high and low/no toxicity datasets had 2 and 418 actives, respectively. We therefore combined this dataset with the ECOTOX dataset, following the combination of the curated data from these sources all but the lowest value per compound was retained. We combined the ECOTOX Daphnia Magna dataset, a recent paper using acute toxicity toward Daphnia magna for machine learning models⁵³ and data from the Ecotoxicity Test Database by MOE of Japan downloaded from their website⁵¹. In the same manner as the ECOTOX datasets, following the combination of the curated data from these sources all but the lowest value per compound was retained. The dataset had a total of 1377 compounds after processing in Assay Central (Table 1).

Machine learning

Our proprietary software Assay Central was used to generate multiple machine learning algorithms that are integrated to build classification and regression models that have bene described in detail previously ³⁴. The algorithms used included Bernoulli naïve Bayes, Linear Logistic Regression, AdaBoost Decision Tree, Random Forest, Support Vector Machine, Deep Learning (DL) and XGBoost. Machine learning model validation was performed using a nested 5-fold cross validation. Nested 5-fold cross validation initially selects a random, stratified 20% hold out set that is removed from the training set prior to model building. The model is then built with the other 80% of the training data and the hyperparameters (if applicable) are optimized using a grid search using 5-fold dataset splits (20% validation sets) – see also Supplemental methods and Table S1. This optimized model is then used to predict the initial 20% hold out set and then repeated until all compounds have been in a hold-out set (total 20 models trained). The final nested 5-fold cross validation scores are an average of each of the hold-out set metrics. Due to its high computational requirement DL uses a 20% leave out set instead. Models were built using ECFP6 descriptors only and metrics generated as described previously ³⁴. The applicability domain was calculated based on the reliability-density neighborhood (RDN) method which considers the model overlap and the individual bias and precision of the overlapping fingerprints⁵⁴.

External test set

An external test set was curated by searching PubMed for papers from the last 2 years 2022–2020 describing ‘mouse and LD₅₀^’. These papers resulted in 53 molecules not in the training sets (Table 2) that were ultimately used as a test set for the mouse toxicity models consisting of mouse data for oral (46), IV (4) and IP (2).

Table 2.

Literature curated test set for mouse LD₅₀ predictions. For molecule structures and predictions for each individual machine learning algorithm please see the Supplemental data table.

Molecule name	Reference	Route of Administration	EPA Category	Qualifier	LD₅₀ (mg/kg)	Average LD₅₀ Prediction with regression models for route of admin (mg/kg)	Average LD₅₀ prediction with classification models	Applicability Domain for classification models
Gambierone	⁵⁷	IP		=	2	12.86	-	-
IMB-0523	⁵⁸	IP		=	448	202.09	-	-
2	⁵⁹	IV		=	42	21.93	-	-
Harmine	⁶⁰	IV		=	27	27.73	-	-
1	⁵⁹	IV		=	26	25.37	-	-
3	⁵⁹	IV		=	24	27.16	-	-
Mitomycin C	⁶¹	Oral	I	<	3	551.25	0	0.30
6	⁶¹	Oral	I	<	20	668.16	1	0.50
Gelsenicine ^*	⁶²	Oral	I	=	1	360.86	1	0.43
Dihydroanatoxin-a	⁶³	Oral	I	=	3	591.64	1	0.28
Anatoxin-a	⁶³	Oral	I	=	11	709.03	0	0.24
Dechloro-CPF ^*	⁶⁴	Oral	I	=	45	494.41	0	0.19
19l	⁶⁵	Oral	III	=	944	685.47	0	0.35
19m	⁶⁵	Oral	III	=	962	753.87	0	0.40
19k	⁶⁵	Oral	III	=	1001	607.87	0	0.37
19n	⁶⁵	Oral	III	=	1148	635.59	0	0.41
19j	⁶⁵	Oral	III	=	1155	712.26	0	0.42
20l	⁶⁵	Oral	III	=	1158	626.8	0	0.41
19h	⁶⁵	Oral	III	=	1176	740.48	0	0.44
19p	⁶⁵	Oral	III	=	1189	573.93	0	0.33
20b	⁶⁵	Oral	III	=	1203	795.03	0	0.45
20d	⁶⁵	Oral	III	=	1203	564.81	0	0.45
20f	⁶⁵	Oral	III	=	1243	868.19	0	0.46
20i	⁶⁵	Oral	III	=	1352	615.40	0	0.45
20n	⁶⁵	Oral	III	=	1386	599.58	0	0.43
20p	⁶⁵	Oral	III	=	1386	603.34	0	0.43
4k	⁶⁶	Oral	III	=	1565	509.90	0	0.38
4l	⁶⁶	Oral	III	=	1590	555.66	0	0.55
4a	⁶⁶	Oral	III	=	1625	554.75	0	0.38
4b	⁶⁶	Oral	III	=	1625	528.96	0	0.54
4d	⁶⁶	Oral	III	=	1634	521.79	0	0.52
4m	⁶⁶	Oral	III	=	1650	542.42	0	0.36
4n	⁶⁶	Oral	III	=	1700	564.55	0	0.38
4i	⁶⁶	Oral	III	=	1720	631.95	0	0.25
4c	⁶⁶	Oral	III	=	1728	574.29	0	0.52
4f	⁶⁶	Oral	III	=	1790	689.81	0	0.33
4h	⁶⁶	Oral	III	=	1790	662.78	0	0.38
4e	⁶⁶	Oral	III	=	1820	759.07	0	0.35
4g	⁶⁶	Oral	III	=	1820	644.21	0	0.33
4j	⁶⁶	Oral	III	=	1840	492.15	0	0.25
Nevirapine	⁶⁷	Oral	III	=	2154	560.35	0	0.46
Permethrin	⁶⁸	Oral	IV	>	500	693.57	0	0.30
DCA-O	⁶⁸	Oral	IV	>	500	746.64	0	0.32
DCA-01	⁶⁸	Oral	IV	>	500	601.33	0	0.28
DCA-11	⁶⁸	Oral	IV	>	500	719.26	0	0.25
29	⁶⁹	Oral	IV	>	500	420.49	1	0.25
Deltamethrin	⁶⁸	Oral	IV	>	500	564.69	1	0.27
F8	⁷⁰	Oral	IV	>	1000	777.45	0	0.50
iodophenyl 5-methyl-3-(p-tolyl)-1H-pyrazole-1-sulfonate	⁷¹	Oral	IV	>	2000	666.38	0	0.55
2-chlorophenyl 5-methyl-3-(p-tolyl)-1H-pyrazole-1-sulfonate	⁷¹	Oral	IV	>	2000	766.65	0	0.55
Jaranol	⁷²	Oral	IV	>	2000	719.96	0	0.42
(−)-Carveol	⁷³	Oral	IV	>	2500	464.85	1	0.26
Estragole	⁷⁴	Oral	IV	>	2500	500.88	1	0.26

Open in a new tab

sex difference averaged

RESULTS

Rat

The CATMoS rat acute oral toxicity dataset^{26, 27} consisted of 8,397 Compounds (range 0.12 – 71,000 mg/kg) with discreet LD₅₀ values. As we previously described classification models for this dataset³¹, we have now generated regression models as an alternative. The training and evaluation sets were combined and only structures that had LD₅₀ values were retained in order to build a continuous model (Figure 1). The highest R² values and lowest MAE were observed for support vector regression (R² = 0.47, MAE = 0.45) and random forest regression (R² = 0.46, MAE = 0.47). For comparative purposes of the external test set predictions, a classification model was built using the source dataset. The classification training dataset was larger (11,297 compounds) as it included compounds that have been assigned an EPA category score even though it may not have had a discreet LD₅₀ value.

Figure 1. — Rat acute oral toxicity classification (A) and regression (B) models built with ECFP6 (1028/2048 bits for classification/regression) nested 5-fold cross validation statistics (log₁₀[mg/kg]) shown for the training dataset, which has 8,397 compounds with a total range of 0.12 – 71,000 mg/kg. (rfr = random forest regression, knnr = k-nearest neighbors regression, svr = support vector machine regression, br = Naïve Bayesian regression, adar = AdaBoosted decision trees regression, xgbr = xgboost regression, lreg = linear regression).

Mouse

We generated regression models for subcutaneous (SC, Figure S1, 91 compounds; range 0.13 – 4000 mg/kg), oral (Figure 2, 803 compounds; range 2 – 6000 mg/kg) intravenous (Figure S2, 409 compounds; range 0.45 ng/kg – 1,621 mg/kg), intraperitoneal (Figure 3, 1,376 compounds; range 3.5 ng/kg – 6,500 mg/kg) and intramuscular (Figure S3, 126 compounds; range 1.20 – 1611 mg/kg) dosed mice. The larger datasets tended to produce models with improved R² correlations (e.g. oral dosed knnr (0.4), svr (0.38) and Bayesian regression (0.34), intraperitoneal dosed knnr (0.78), svr (0.77) and Bayesian regression (0.74), rfr (0.77). For comparative purposes we also created a mouse acute oral toxicity classification model, with a threshold of <500mg/kg (EPA Category II).

Figure 3. — IP dosed mice LD₅₀ regression models built with ECFP6 (2048 bits) nested 5-fold cross validation statistics (log₁₀[mg/kg]) shown for the training dataset, which has 1,376 compounds; range 3.5 ng/kg – 6,500 mg/kg. (rfr = random forest regression, knnr = k-nearest neighbors regression, svr = support vector machine regression, br = Naïve Bayesian regression, adar = AdaBoosted decision trees regression, xgbr = xgboost regression, lreg = linear regression).

Comparison of mouse and rat

We identified 52 molecules that were shared between the mouse and rat datasets for which there was LD₅₀ data for oral administration. A correlation analysis demonstrated a statistically significant correlation Pearson R = 0.74 (Figure 4).

Figure 4. — Correlation of 52 overlapping compounds tested for acute toxicity via oral administration for both mouse and rat. Correlation is calculated using Graphpad Prism 9.4.1.

Fish

The final acute fish toxicity dataset from the ECOTOX Knowledgebase had 1986 compounds. The high and low/no general fish acute toxicity datasets from ECOTOX had 613 and 426 actives, respectively, and the deep learning, svc and knn models for the high activity dataset performs well whereas knn slightly outperforms the other methods for the low/no toxicity dataset (Figure S4).

Additional fish toxicity data was obtained from a recent paper ⁵² from the EPA and an individual model was built from this dataset. This represented an extremely large dataset (~96,000 entries) that required considerable pruning. Following the removal of inorganics and duplicates the final dataset had 2303 compounds with the high and low/no toxicity datasets having 678 and 534 actives, respectively, with DL and xgb performing well for the high toxicity and DL and svc for the low/no toxicity models (Figure S5).

We also combined the ECOTOX dataset and the Ecotoxicity Test Database from the MOE of Japan. Following the removal of inorganics and duplicates the final dataset had 2823 compounds Two datasets were built representing high and low/no toxicity: ≤1mg/L and ≥100mg/L, respectively. The number of actives for high and low/no toxicity for these models were 820 and 624, respectively. Cross validation statistics of these models are shown indicating generally poor statistics other than for rf for the high toxicity dataset and rf and svc and low/no toxicity (Figure S6).

Finally we combined all these datasets, the ECOTOX dataset, the Ecotoxicity Test Database from the MOE of Japan and the recent paper ⁵² from the EPA. The final dataset had 2821 compounds with the high and low/no toxicity datasets having 818 and 624 actives, respectively. Cross validation statistics for these models are shown in Figure 5. Rf, svc and xgb overall had the best cross validation statistics in both the high and low/no toxicity models.

Figure 5. — ECOTOX, MOE and EPA paper Fish datasets classification models representing high (a) and (b) low/no toxicity: ≤1mg/L and ≥100mg/L showing nested 5-fold cross validation. (DL = deep learning, ada = AdaBoosted decision trees, bnb = Naïve Bayesian, knn = k-nearest neighbors, lreg = linear regression, rf = random forest, svc = support vector machine, xgb = xgboost).

Daphnia

The final ECOTOX dataset contained 679 compounds, though some of these were inorganic compounds. Following the removal of inorganics and duplicates the final dataset had 574 compounds. Two datasets were built representing high and low/no toxicity: ≤1mg/L and ≥100mg/L, respectively. The high and low/no toxicity datasets had 231 and 82 actives, respectively with xgb appearing slightly better than the other high toxicity datasets and rf for the low/no toxicity dataset (Figure S7). The dataset which was from a concatenation of the ECOTOX and from a recent publication⁵³ had a total of 934 compounds after final processing. Two datasets were built representing high and low/no toxicity: ≤1mg/L and ≥100mg/L, respectively. The high and low/no toxicity datasets had 233 and 428 actives, respectively. Cross validation statistics of these models are shown with all methods generally performing well for all statistics and rf slightly better for the high toxicity dataset and svc for the no/low toxicity. (Figure S8).

The largest model for acute daphnid toxicity, which combined data from three different sources, had a total of 1377 compounds after processing in AC. Two datasets were built representing high and low/no toxicity: ≤1mg/L and ≥100mg/L, respectively. The high and low/no toxicity datasets had 345 and 484 actives, respectively. Cross validation statistics of these models are shown with svc slightly outperforming the other methods for the high toxicity dataset and DL and svc for the low/no toxicity dataset (Figure 6).

Figure 6. — ECOTOX, MOE Daphnia Magna datasets and data from a recent paper using acute toxicity toward Daphnia magna for machine learning models⁵³ classification models representing high (a) and (b) low/no toxicity: ≤1mg/L and ≥100mg/L showing nested 5-fold cross validation. (DL = deep learning, ada = AdaBoosted decision trees, bnb = Naïve Bayesian, knn = k-nearest neighbors, lreg = linear regression, rf = random forest, svc = support vector machine, xgb = xgboost).

Comparison of freshwater fish and daphnid

We identified 727 molecules that were shared between the freshwater fish and daphnid datasets for which there was LC₅₀ data. A correlation analysis demonstrated a statistically significant correlation Pearson R = 0.68 (Figure 7).

Figure 7. — Correlation of 727 overlapping compounds tested for acute toxicity against freshwater fish and daphnid. Data is log transformed prior to calculation of Pearson correlation. Outliers (Q=1%) are removed prior to calculation. Correlation is calculated using Graphpad Prism 9.4.1.

Fish Chronic Toxicity

Values for Chronic toxicity (ChV) required the no observed effect concentration (NOEC) and the lowest observed effect concentration (LOEC) for the same compound from the same experiment (i.e., the same test conditions, including duration and test concentrations, for the same species of animal/algae). To calculate these, we matched the NOEC and LOEC per compound by experiment and calculated the geometric mean for each matched pair. For data from the EPA’s ECOTOX website, SMILES identification and the pruning of the dataset was done in the same manner as previously described. SMILES were then added based on the CAS number. These were found using a batch lookup on the EPA’s CompTox dashboard from the EPA. Compounds for which SMILES were not found by CAS were identified by the name found via the CompTox dashboard on PubChem. All compounds without available SMILES were removed. SMILES were canonicalized using our in-house software and then all but the lowest values were discarded. The criteria used to create the output was: Group: Fish, Endpoints: NOEC/LOEC. Export was limited to 10,000 entries so the years were broken up into separate files. After export these were all combined and were between the years of 1915–2022 and had ~100,000 entries. The geometric mean was calculated for each compound where both NOEC and LOEC were available for the same experiment and following the removal of compounds without SMILES and duplicates had a final unique compound count of 1069. Following the removal of inorganics and duplicates the final dataset had 984 compounds. Two datasets were built representing high and low/no toxicity: ≤0.1mg/L and ≥10mg/L, respectively. The high and low/no toxicity datasets had 436 and 174 actives, respectively. Cross validation statistics of these models are shown in Figure S9 with DL performing the best for the high toxicity dataset and no clear winner for the no/low toxicity dataset. Using the combination of the ECOTOX and additional fish toxicity data obtained from a recent paper ⁵² following the curation of the data from this source all but the lowest value per compound was retained. The dataset had a total of 1087 compounds after processing in Assay Central. Two datasets were built representing high and low/no toxicity: ≤0.1mg/L and ≥10mg/L, respectively. The high and low/no toxicity datasets had 458 and 217 actives, respectively. Cross validation statistics of these models are shown in Figure 8 with knn performing slightly better for the high toxicity and DL for the no/low toxicity dataset.

Figure 8. — ECOTOX, MOE and EPA paper fish chronic toxicity datasets classification models representing high (a) and (b) low/no toxicity: ≤1mg/L and ≥100mg/L showing nested 5-fold cross validation. (DL = deep learning, ada = AdaBoosted decision trees, bnb = Naïve Bayesian, knn = k-nearest neighbors, lreg = linear regression, rf = random forest, svc = support vector machine, xgb = xgboost).

Daphnia Chronic Toxicity

As described earlier the criteria used to create the daphnia dataset from the ECOTOX database was: species: Daphnia, endpoints: NOEC/LOEC and published between the years of 1915–2022. SMILES were then added based on the CAS number. The geometric mean was calculated for each compound where both NOEC and LOEC were available for the same experiment and following the removal of compounds without SMILES and duplicates had a final unique compound count of 566. Following the removal of inorganics and duplicates the final dataset had 515 compounds. Two datasets were built representing high and low/no toxicity: ≤0.1mg/L and ≥10mg/L, respectively. The high and low/no toxicity datasets had 231 and 80 actives, respectively. Cross validation statistics of these models are shown in Figure 9 with DL performing well for the high and low/no toxicity datasets.

Figure 9. — ECOTOX daphnia chronic toxicity dataset classification models representing high (a) and (b) low/no toxicity: ≤1mg/L and ≥100mg/L showing nested 5-fold cross validation. (DL = deep learning, ada = AdaBoosted decision trees, bnb = Naïve Bayesian, knn = k-nearest neighbors, lreg = linear regression, rf = random forest, svc = support vector machine, xgb = xgboost).

Aquatic Toxicity regression models

We have also generated regression models for fish and daphnia as follows. For fish (freshwater) we curated a dataset of 2657 molecules (range 1 ng/L – 62 g/L) from ECOTOX and fish toxicity data obtained from a recent paper ⁵². This curation involved an additional step where a compound that was tested in the same species was averaged (geometric) prior to choosing the most sensitive species. This curation was done to avoid the variation due to experimental noise, but also allowed us to select the most sensitive species, as suggested by the EPA. Data was restricted to the studies that had a 96-hour window for LC₅₀ as per the ECOSAR suggestions (Figure S10). The best performing algorithms as measured by R² were svr (0.39), rfr (0.39) and br (0.39). Datasets from ECOTOX, MOE and from a recent paper using acute toxicity toward Daphnia magna for machine learning models⁵³ were combined (1379 compounds; range 1 ng/L - 5.2 kg/L) and restricted to the studies that had a 48-hour window for LC₅₀ as per the ECOSAR suggestions (Figure S11). The best performing algorithm was svr (0.5).

Correlation between acute toxicity data for fish and rat

We identified 1339 molecules with acute toxicity data that were shared between the freshwater fish and rat. A correlation analysis demonstrated a statistically significant correlation Pearson R = 0.32 (Figure S12).

External test set for mouse LD₅₀ models

53 molecules with mouse LD₅₀ data were obtained from 17 papers (Table 2) and used as an external dataset for the IP, IV and oral regression models as well as the oral classification model. The dataset is rather small and limited in molecule diversity with several analogs (Table 2 and Supplemental data table). The 2 molecules for IP show a correct rank order while the 4 IP molecules are predicted with a similar score with the IP model and the oral molecules are predicted similarly high with the respective oral regression model. The classification model does slightly better at prediction some of the more potent molecules but there is room for improvement. This is seen more clearly when we compared the classification and regression models with 47 molecules with acute oral toxicity data (Figure 2C, D), showing the classification model has a higher balanced accuracy (0.70) versus regression (0.63).

We have assessed a method for identification of potential toxicophore features in the models generated. SHapley Additive exPlanations (SHAP)^{55, 56} is a method that can be used to explain the output of a machine learning model based on game theory. It provides an explanation of the feature importance through the additive nature of SHAP values where the SHAP values of all input features will add up to the difference between the current prediction and the baseline model output. By taking the mean absolute SHAP value of each feature over all of the test set predictions to obtain a global measure of feature importance. As an illustration of this approach, we performed SHAP analysis on our mouse LD₅₀ regression model to obtain the feature importance of the 2048 sparse bit vector. We then ranked the top global measure of feature importance (Figure S13A) and retraced the fingerprint generation to obtain the substructure which would ultimately end up setting the bit, in order to capture the substructure importance and include any bit collisions during the hash folding procedure (Figure S13B). We can also visualize the substructures on an example molecule is illustrated (Figure S13C). The features selected appear quite uneventful, for example feature 1840 describes a fluorine substituent and feature 1236 describes a tertiary amine. This may represent a limitation of using this approach with sparse fingerprints like ECFP6.

Chemical space analysis of LD₅₀ and LC₅₀ datasets

To visualize the chemical space covered by the LD₅₀ and LC₅₀ datasets for the four species in this study we generated a UMAP plot⁷⁵ using ECFP6 descriptors showing that a similar chemical space has been explored for acute toxicity for each species. (Figure 10).

Figure 10. — A UMAP plot generated from ECFP6 for ~19,000 compounds where acute toxicity LD₅₀ and LC₅₀ was assessed experimentally.

Although differences could not be readily distinguished using UMAP with ECFP6, an additional analysis was performed in which we compared multiple simple chemical descriptors that were calculated using Chemaxon software (Budapest Hungary) for the largest datasets, representing aquatic and mammalian acute toxicity. Figure 11A shows the same UMAP graph with only freshwater fish and rat acute toxicity data. The distributions of the six simple chemical descriptors show a significant difference in the molecular weight, polar surface area, number of rotatable bonds and hydrogen bond acceptors distributions and mean values (Figure 11B).

Figure 11. — UMAP plot of rat oral and freshwater fish toxicity data, B. Simple molecular descriptor distributions for rat and freshwater fish acute toxicity data.

DISCUSSION

The LD₅₀ or LC₅₀ value is used to understand the potential for toxicity of chemicals and aid in regulatory decision making. The LD₅₀ is also an important parameter for chemists to understand the toxicity of chemicals which they synthesize, and hence choose the right personal protective equipment to use. Many thousands of new compounds are synthesized daily for which this toxicity information is not readily available, hence there is a gap which could be readily filled by NAMs such as machine learning algorithms to categorize the toxicity of new compounds based on LD₅₀ (or LC₅₀) values (e.g. highly toxic, toxic, or harmful). Since a compound is considered highly toxic when the LD₅₀ is lower than 25 mg/kg, such a general categorization could provide this valuable safety information as well as prioritize which compounds may need to be tested.

We have now described various regression and classification models generated with LD₅₀ and LC₅₀ datasets for rat, mouse, fish and daphnia datasets (Figures 1–3, 5–6, 8–9) curated from multiple public sources. We have also shown that when the same compounds are compared across species there is an excellent relationship for mouse and rat acute oral toxicity (Figure 4, comparable to those described by others for a much larger dataset⁴⁵), freshwater fish and daphnid toxicity (Figure 7) as well as rat and fish (Figure S12). We have also visualized the chemical property space of these datasets showing the general overlap (Figure 10) and assessed the distributions of the simple molecule descriptors (Figure 11), showing that some differences are observed between fish and rat as an example.

The curation of a relatively small test set for mouse LD₅₀ data highlights some of the challenges we face, with the limited diversity (some natural products, and larger numbers of synthetic analogs) with few toxic molecules and large numbers of non-toxic molecules that are predominantly all scored similarly with the acute oral toxicity model (Table 2). However, we were able to use this to compare classification and regression models for mouse acute oral toxicity, which suggested the classification model may be slightly better. The curation of a larger more diverse test set might be beneficial or even the use of a prospective validation set enabled by using the model/s to score a very large library of commercially available molecules prior to testing selected predicted toxic and non-toxic molecules in mice (outside the scope of the current study and budget). Additional methods to help visualize important features in models should also be evaluated as the SHAP values for the ECFP6 fingerprints did not reveal any potential toxicophores in the mouse LD₅₀ dataset (Figure S13) likely due to the sparsity of the fingerprints.

These LD₅₀ and LC₅₀ machine learning models can be integrated into a single tool which we have called MegaTox. The availability of these different datasets can also be used to provide read across analysis for new molecules for which there is no LD₅₀ or LC₅₀ toxicity data available. We could also use such models to score very large collections of molecules (such as DNA encoded libraries) that would be impossible to test in vivo⁷⁶. This approach could be used to prioritize compounds for testing and integrated into chemical safety websites and other tools. Such machine learning models could also be further integrated into chemical detection hardware to identify the potential threat posed by novel molecules in the environment when combined with a sensitive analytical detection system. To do this reliably would require a suitable applicability domain to ensure that a prediction was indeed for a chemical that was covered by the chemical property space of the model such that the toxicity predictions are reliable. There are many such applicability methods such as Euclidean, city block, Tanimoto, Mahalanobis, hoteling T2, leverage and others that can be used to measure the distance from a training set to a test molecule^77–79. With models that can predict toxicity from molecule structures alone also comes the great responsibility to ensure that they are not misused and that the potential for any dual use is minimized by narrowing the scope of the models to predict fewer molecules, or restricting access^80–82.

In conclusion, we have demonstrated that LD₅₀ or LC₅₀ machine learning regression and classification models can be generated for rat, mouse, fish and daphnia after careful data curation which have generally good 5-fold nested cross validation statistics. These models present a starting point for future safety assessment applications, further testing, validation and improvement.

Supplementary Material

supplementary material

NIHMS1904611-supplement-supplementary_material.docx^{(4.6MB, docx)}

ACKNOWLEDGMENTS

We kindly thank Dr. Wei Wang for the invitation to submit an article on this topic and our colleagues including Jacob Gerlach for their support and assistance developing the software described herein. Dr. Diedrich Bermudez is also kindly acknowledged for bringing the aquatic toxicity datasets to our attention.

Grant information

We kindly acknowledge NIH funding: 2R44GM122196-04A1 from NIGMS and 2R44ES031038-02A1 from NIEHS.

ABBREVIATIONS USED

ada: AdaBoosted decision trees
adar: AdaBoosted decision trees regression
AUC: Area-under-the-curve
CATMoS: Collaborative Acute Toxicity Modeling Suite
DL: deep learning
EPA: US Environmental Protection Agency
ECFP6: extended connectivity fingerprint 6
knn: k-nearest neighbors
knnr: k-nearest neighbors regression
LC₅₀: lethal concentration
LD₅₀: lethal dose
lreg: linear regression
MCC: Matthews Correlation Coefficient
bnb: Naïve Bayesian
br: Naïve Bayesian regression
NAMs: new approach methodologies
NICEATM: NTP Interagency Center for the Evaluation of Alternative Toxicological Methods
NCCT: EPA National Center for Computational Toxicology
OECD: Organization for Economic Cooperation and Development
QSAR: quantitative structure activity relationships
rf: random forest
rfr: random forest regression
svc: support vector machine
svr: support vector machine regression
xgb: xgboost
xgbr: xgboost regression

Footnotes

Conflicts of interest

S.E. is owner, all others are employees of Collaborations Pharmaceuticals, Inc. which licenses the software described herein.

Statement on dual use

The acute oral toxicity machine learning models described in this study have potential dual use capabilities, and we therefore propose to implement restrictions to control who has access to these models and limit the number of molecules predicted when used on a website. We believe such precautions are necessary and these will evolve over time as we integrate software features to limit and prevent this dual use.

SUPPORTING INFORMATION

Supporting further details are available as graphs and tables as well as molecule files. This material is available free of charge via the Internet at http://pubs.acs.org.

REFERENCES

1.Strickland J; Clippinger AJ; Brown J; Allen D; Jacobs A; Matheson J; Lowit A; Reinke EN; Johnson MS; Quinn MJ Jr.; Mattie D; Fitzpatrick SC; Ahir S; Kleinstreuer N; Casey W, Status of acute systemic toxicity testing requirements and data uses by U.S. regulatory agencies. Regul Toxicol Pharmacol 2018, 94, 183–196. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Morris-Schaffer K; McCoy MJ, A Review of the LD50 and Its Current Role in Hazard Communication. ACS Chemical Health & Safety 2021, 28 (1), 25–33. [Google Scholar]
3.Anon OECD GUIDELINE FOR TESTING OF CHEMICALS. https://ntp.niehs.nih.gov/iccvam/suppdocs/feddocs/oecd/oecd_gl425-508.pdf (accessed Sept 29th 2020).
4.Walum E, Acute oral toxicity. Environ Health Perspect 1998, 106 Suppl 2, 497–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Wang L; Ding J; Shi P; Fu L; Pan L; Tian J; Cao D; Jiang H; Ding X, Ensemble machine learning to evaluate the in vivo acute oral toxicity and in vitro human acetylcholinesterase inhibitory activity of organophosphates. Arch Toxicol 2021, 95 (7), 2443–2457. [DOI] [PubMed] [Google Scholar]
6.Zainzinger V, Can Europe replace animal testing of chemicals?. Chemical and Engineering News 2022, 100. https://cen.acs.org/biological-chemistry/toxicology/Europe-replace-animal-testing-chemicals/100/i28 (accessed Feb 1st 2022). [Google Scholar]
7.OECD OECD principles for the validation, for regulatory purposes, of (quantitative) structure-activity relationship models https://www.oecd.org/chemicalsafety/risk-assessment/37849783.pdf (accessed Sept 29th 2020).
8.Stucki AO; Barton-Maclaren TS; Bhuller Y; Henriquez JE; Henry TR; Hirn C; Miller-Holt J; Nagy EG; Perron MM; Ratzlaff DE; Stedeford TJ; Clippinger AJ, Use of new approach methodologies (NAMs) to meet regulatory requirements for the assessment of industrial chemicals and pesticides for effects on human health. Front Toxicol 2022, 4, 964553. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Anon REACH. https://ec.europa.eu/environment/chemicals/reach/reach_en.htm (accessed Sept 29th 2020).
10.EPA Endocrine Disruptor Screening Program Tier 1 Battery of Assays. https://www.epa.gov/endocrine-disruption/endocrine-disruptor-screening-program-tier-1-battery-assays (accessed Sept 29th 2020).
11.Anon ICCVAM, A Strategic Roadmap for Establishing New Approaches to Evaluate the Safety of Chemicals and Medical Products in the United States, 2018. . https://ntp.niehs.nih.gov/whatwestudy/niceatm/natl-strategy/index.html?utm_source=direct&utm_medium=prod&utm_campaign=ntpgolinks&utm_term=natl-strategy (accessed Sept 29th 2020).
12.Zakarya D; Larfaoui EM; Boulaamail A; Lakhlifi T, Analysis of structure-toxicity relationships for a series of amide herbicides using statistical methods and neural network. SAR QSAR Environ Res 1996, 5 (4), 269–79. [DOI] [PubMed] [Google Scholar]
13.Eldred DV; Jurs PC, Prediction of acute mammalian toxicity of organophosphorus pesticide compounds from molecular structure. SAR QSAR Environ Res 1999, 10 (2–3), 75–99. [DOI] [PubMed] [Google Scholar]
14.Enslein K, A toxicity estimation model. J Environ Pathol Toxicol 1978, 2 (1), 115–21. [PubMed] [Google Scholar]
15.Enslein K; Lander TR; Tomb ME; Craig PN, A predictive model for estimating rat oral LD50 values. Toxicol Ind Health 1989, 5 (2), 261–387. [DOI] [PubMed] [Google Scholar]
16.Martin TM; Lilavois CR; Barron MG, Prediction of pesticide acute toxicity using two-dimensional chemical descriptors and target species classification. SAR QSAR Environ Res 2017, 28 (6), 525–539. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Sazonovas A; Japertas P; Didziapetris R, Estimation of reliability of predictions and model applicability domain evaluation in the analysis of acute toxicity (LD50). SAR QSAR Environ Res 2010, 21 (1), 127–48. [DOI] [PubMed] [Google Scholar]
18.Lagunin A; Zakharov A; Filimonov D; Poroikov V, QSAR Modelling of Rat Acute Toxicity on the Basis of PASS Prediction. Mol Inform 2011, 30 (2–3), 241–50. [DOI] [PubMed] [Google Scholar]
19.Hamadache M; Benkortbi O; Hanini S; Amrane A; Khaouane L; Si Moussa C, A Quantitative Structure Activity Relationship for acute oral toxicity of pesticides on rats: Validation, domain of application and prediction. J Hazard Mater 2016, 303, 28–40. [DOI] [PubMed] [Google Scholar]
20.Xu Y; Pei J; Lai L, Deep Learning Based Regression and Multiclass Models for Acute Oral Toxicity Prediction with Automatic Chemical Feature Extraction. J Chem Inf Model 2017, 57 (11), 2672–2685. [DOI] [PubMed] [Google Scholar]
21.Lei T; Li Y; Song Y; Li D; Sun H; Hou T, ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling. J Cheminform 2016, 8, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Zhu H; Martin TM; Ye L; Sedykh A; Young DM; Tropsha A, Quantitative structure-activity relationship modeling of rat acute toxicity by oral exposure. Chem Res Toxicol 2009, 22 (12), 1913–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Zhu H; Ye L; Richard A; Golbraikh A; Wright FA; Rusyn I; Tropsha A, A novel two-step hierarchical quantitative structure-activity relationship modeling work flow for predicting acute toxicity of chemicals in rodents. Environ Health Perspect 2009, 117 (8), 1257–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Li X; Chen L; Cheng F; Wu Z; Bian H; Xu C; Li W; Liu G; Shen X; Tang Y, In silico prediction of chemical acute oral toxicity using multi-classification methods. J Chem Inf Model 2014, 54 (4), 1061–9. [DOI] [PubMed] [Google Scholar]
25.Kleinstreuer NC; Karmaus A; Mansouri K; Allen DG; Fitzpatrick JM; Patlewicz G, Predictive Models for Acute Oral Systemic Toxicity: A Workshop to Bridge the Gap from Research to Regulation. Comput Toxicol 2018, 8 (11), 21–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Mansouri K; Karmaus A; Fitzpatrick J; Patlewicz G; Pradeep P; Alberga D; Alepee N; Allen TEH; Allen D; Alves VM; Andrade CH; Auernhammer TR; Ballabio D; Bell S; Benfenati E; Bhattacharya S; Bastos JV; Boyd S; Brown JB; Capuzzi SJ; Chushak Y; Ciallella H; Clark AM; Consonni V; Daga PR; Ekins S; Farag S; Fedorov M; Fourches D; Gadaleta D; Gao F; Gearhart JM; Goh G; Goodman JM; Grisoni F; Grulke CM; Hartung T; Hirn M; Karpov P; Korotcov A; Lavado GJ; Lawless M; Li X; Luechtefeld T; Lunghini F; Mangiatordi GF; Marcou G; Marsh D; Martin T; Mauri A; Muratov EN; Myatt GJ; Nguyen DT; Nicolotti O; Note R; Pande P; Parks AK; Peryea T; Polash A; Rallo R; Roncaglioni A; Rowlands C; Ruiz P; Russo D; Sayed A; Sayre R; Sheils T; Siegel C; Silva AC; Simeonov A; Sosnin S; Southall N; Strickland J; Tang Y; Teppen B; Tetko IV; Thomas D; Tkachenko V; Todeschini R; Toma C; Tripodi I; Trisciuzzi D; Tropsha A; Varnek A; Vukovic K; Wang Z; Wang L; Waters KM; Wedlake AJ; Wijeyesakere SJ; Wilson D; Xiao Z; Yang H; Zahoranszky-Kohalmi G; Zakharov AV; Zhang FF; Zhang Z; Zhao T; Zhu H; Zorn KM; Casey W; Kleinstreuer NC, Erratum: CATMoS: Collaborative Acute Toxicity Modeling Suite. Environ Health Perspect 2021, 129 (7), 79001. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Mansouri K; Karmaus AL; Fitzpatrick J; Patlewicz G; Pradeep P; Alberga D; Alepee N; Allen TEH; Allen D; Alves VM; Andrade CH; Auernhammer TR; Ballabio D; Bell S; Benfenati E; Bhattacharya S; Bastos JV; Boyd S; Brown JB; Capuzzi SJ; Chushak Y; Ciallella H; Clark AM; Consonni V; Daga PR; Ekins S; Farag S; Fedorov M; Fourches D; Gadaleta D; Gao F; Gearhart JM; Goh G; Goodman JM; Grisoni F; Grulke CM; Hartung T; Hirn M; Karpov P; Korotcov A; Lavado GJ; Lawless M; Li X; Luechtefeld T; Lunghini F; Mangiatordi GF; Marcou G; Marsh D; Martin T; Mauri A; Muratov EN; Myatt GJ; Nguyen DT; Nicolotti O; Note R; Pande P; Parks AK; Peryea T; Polash AH; Rallo R; Roncaglioni A; Rowlands C; Ruiz P; Russo DP; Sayed A; Sayre R; Sheils T; Siegel C; Silva AC; Simeonov A; Sosnin S; Southall N; Strickland J; Tang Y; Teppen B; Tetko IV; Thomas D; Tkachenko V; Todeschini R; Toma C; Tripodi I; Trisciuzzi D; Tropsha A; Varnek A; Vukovic K; Wang Z; Wang L; Waters KM; Wedlake AJ; Wijeyesakere SJ; Wilson D; Xiao Z; Yang H; Zahoranszky-Kohalmi G; Zakharov AV; Zhang FF; Zhang Z; Zhao T; Zhu H; Zorn KM; Casey W; Kleinstreuer NC, CATMoS: Collaborative Acute Toxicity Modeling Suite. Environ Health Perspect 2021, 129 (4), 47013. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Alberga D; Trisciuzzi D; Mansouri K; Mangiatordi GF; Nicolotti O, Prediction of Acute Oral Systemic Toxicity Using a Multifingerprint Similarity Approach. Toxicol Sci 2019, 167 (2), 484–495. [DOI] [PubMed] [Google Scholar]
29.Gadaleta D; Vuković K; Toma C; Lavado GJ; Karmaus AL; Mansouri K; Kleinstreuer NC; Benfenati E; Roncaglioni A, SAR and QSAR modeling of a large collection of LD50 rat acute oral toxicity data. Journal of Cheminformatics 2019, 11 (1), 58. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Graham JC; Rodas M; Hillegass J; Schulze G, The performance, reliability and potential application of in silico models for predicting the acute oral toxicity of pharmaceutical compounds. Regul Toxicol Pharmacol 2021, 119, 104816. [DOI] [PubMed] [Google Scholar]
31.Minerali E; Foil DH; Zorn KM; Ekins S, Evaluation of Assay Central^® Machine Learning Models for Rat Acute Oral Toxicity Prediction. ACS Sustain Chem Eng 2020, 8, 16020–16027. [Google Scholar]
32.Zorn KM; Lane TR; Russo DP; Clark AM; Makarov V; Ekins S, Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets. Mol Pharm 2019, 16 (4), 1620–1632. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Russo DP; Zorn KM; Clark AM; Zhu H; Ekins S, Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction. Mol Pharm 2018, 15 (10), 4361–4370. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Lane T; Russo DP; Zorn KM; Clark AM; Korotcov A; Tkachenko V; Reynolds RC; Perryman AL; Freundlich JS; Ekins S, Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Mol Pharm 2018, 15 (10), 4346–4360. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Lunghini F; Marcou G; Azam P; Horvath D; Patoux R; Van Miert E; Varnek A, Consensus models to predict oral rat acute toxicity and validation on a dataset coming from the industrial context. SAR QSAR Environ Res 2019, 30 (12), 879–897. [DOI] [PubMed] [Google Scholar]
36.Polishchuk P; Tinkov O; Khristova T; Ognichenko L; Kosinskaya A; Varnek A; Kuz’min V, Structural and Physico-Chemical Interpretation (SPCI) of QSAR Models and Its Comparison with Matched Molecular Pair Analysis. J Chem Inf Model 2016, 56 (8), 1455–69. [DOI] [PubMed] [Google Scholar]
37.Lu J; Peng J; Wang J; Shen Q; Bi Y; Gong L; Zheng M; Luo X; Zhu W; Jiang H; Chen K, Estimation of acute oral toxicity in rat using local lazy learning. J Cheminform 2014, 6, 26. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Fan T; Sun G; Zhao L; Cui X; Zhong R, QSAR and Classification Study on Prediction of Acute Oral Toxicity of N-Nitroso Compounds. Int J Mol Sci 2018, 19 (10). [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Vukovic K; Gadaleta D; Benfenati E, Methodology of aiQSAR: a group-specific approach to QSAR modelling. J Cheminform 2019, 11 (1), 27. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Hao Y; Sun G; Fan T; Tang X; Zhang J; Liu Y; Zhang N; Zhao L; Zhong R; Peng Y, In vivo toxicity of nitroaromatic compounds to rats: QSTR modelling and interspecies toxicity relationship with mouse. J Hazard Mater 2020, 399, 122981. [DOI] [PubMed] [Google Scholar]
41.Russo DP; Strickland J; Karmaus AL; Wang W; Shende S; Hartung T; Aleksunes LM; Zhu H, Nonanimal Models for Acute Toxicity Evaluations: Applying Data-Driven Profiling and Read-Across. Environ Health Perspect 2019, 127 (4), 47001. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Edwards SW; Nelms M; Hench VK; Ponder J; Sullivan K, Mapping Mechanistic Pathways of Acute Oral Systemic Toxicity Using Chemical Structure and Bioactivity Measurements. Front Toxicol 2022, 4, 824094. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Li X; Kleinstreuer NC; Fourches D, Hierarchical Quantitative Structure-Activity Relationship Modeling Approach for Integrating Binary, Multiclass, and Regression Models of Acute Oral Systemic Toxicity. Chem Res Toxicol 2020, 33 (2), 353–366. [DOI] [PubMed] [Google Scholar]
44.Brodersen KH; Ong CS; Stephan KE; Buhman JM, The Balanced Accuracy and Its Posterior Distribution. In 2010 20th International Conference on Pattern Recognition, Instanbul, Turkey, 2010. [Google Scholar]
45.Wang Y; Wang S; Feng XN; Yan LC; Zheng SS; Wang Y; Zhao YH, The impact of exposure route for class-based compounds: a comparative approach of lethal toxicity data in rodent models. Drug Chem Toxicol 2018, 41 (1), 95–104. [DOI] [PubMed] [Google Scholar]
46.Lane TR; Urbina F; Rank L; Gerlach J; Riabova O; Lepioshkin A; Kazakova E; Vocat A; Tkachenko V; Cole S; Makarov V; Ekins S, Machine Learning Models for Mycobacterium tuberculosis In Vitro Activity: Prediction and Target Visualization. Mol Pharm 2022, 19 (2), 674–689. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.EPA ECOTOX Knowledgebase. https://cfpub.epa.gov/ecotox/search.cfm (accessed Feb 1st 2022).
48.EPA EPA Comptox dashboard. https://comptox.epa.gov/dashboard (accessed Feb 1st 2022).
49.Williams AJ; Grulke CM; Edwards J; McEachran AD; Mansouri K; Baker NC; Patlewicz G; Shah I; Wambaugh JF; Judson RS; Richard AM, The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform 2017, 9 (1), 61. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Kim S; Chen J; Cheng T; Gindulyte A; He J; He S; Li Q; Shoemaker BA; Thiessen PA; Yu B; Zaslavsky L; Zhang J; Bolton EE, PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res 2021, 49 (D1), D1388–D1395. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Japan M.o. Results of Eco-toxicity tests of chemicals conducted by Ministry of the Environment in Japan. https://www.env.go.jp/chemi/sesaku/02e.pdf. (accessed Feb 1st 2022)
52.Sheffield TY; Judson RS, Ensemble QSAR Modeling to Predict Multispecies Fish Toxicity Lethal Concentrations and Points of Departure. Environ Sci Technol 2019, 53 (21), 12793–12802. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Tinkov OV; Grigorev VY; Razdolsky AN; Grigoryeva LD; Dearden JC, Effect of the structural factors of organic compounds on the acute toxicity toward Daphnia magna. SAR QSAR Environ Res 2020, 31 (8), 615–641. [DOI] [PubMed] [Google Scholar]
54.Aniceto N; Freitas AA; Bender A; Ghafourian T, A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood. Journal of Cheminformatics 2016, 8 (1), 69. [Google Scholar]
55.Lundberg SM; Lee S-I In A Unified Approach to Interpreting Model Predictions, Advances in Neural Information Processing Systems, Guyon I; Von Luxberg U; Bengio S; Wallach H; Fergus R; Vishwanathan S; Garnett R, Eds. Curran Associates, Inc.: 2017. [Google Scholar]
56.Rodríguez-Pérez R; Bajorath J, Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions. Journal of Computer-Aided Molecular Design 2020, 34 (10), 1013–1026. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Murray JS; Finch SC; Puddick J; Rhodes LL; Harwood DT; van Ginkel R; Prinsep MR, Acute Toxicity of Gambierone and Quantitative Analysis of Gambierones Produced by Cohabitating Benthic Dinoflagellates. Toxins (Basel) 2021, 13 (5), 333. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Cui AL; Sun WF; Zhong ZJ; Jin J; Xue ST; Wu S; Li YH; Li ZR, Synthesis and Bioactivity of N-(4-Chlorophenyl)-4-Methoxy-3-(Methylamino) Benzamide as a Potential Anti-HBV Agent. Drug Des Devel Ther 2020, 14, 3723–3729. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Trenin AS; Isakova EB; Treshchalin MI; Polozkova VA; Mirchink EP; Panov AA; Simonov AY; Bychkova OP; Tatarskiy VV; Lavrenov SN, Evaluation of New Antimicrobial Agents Based on tris(1H-Indol-3-yl)methylium Salts: Activity, Toxicity, Suppression of Experimental Sepsis in Mice. Pharmaceuticals (Basel) 2022, 15 (2), 118. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Lv Y; Liang H; Li J; Li X; Tang X; Gao S; Zou H; Zhang J; Wang M; Xiao L, Central inhibition prevents the in vivo acute toxicity of harmine in mice. J Toxicol Sci 2021, 46 (6), 289–301. [DOI] [PubMed] [Google Scholar]
61.Huang D; Zhang M; Zhang H; Cui Z; Luo D; Li T; Li X; He Y; Zhang SL, Design and synthesis of TPP+-Mitomycin C conjugate with reduced toxicity. Bioorg Med Chem Lett 2022, 77, 129036. [DOI] [PubMed] [Google Scholar]
62.Li YJ; Yang K; Long XM; Xiao G; Huang SJ; Zeng ZY; Liu ZY; Sun ZL, Toxicity assessment of gelsenicine and the search for effective antidotes. Hum Exp Toxicol 2022, 41, 9603271211062857. [DOI] [PubMed] [Google Scholar]
63.Puddick J; van Ginkel R; Page CD; Murray JS; Greenhough HE; Bowater J; Selwood AI; Wood SA; Prinsep MR; Truman P; Munday R; Finch SC, Acute toxicity of dihydroanatoxin-a from Microcoleus autumnalis in comparison to anatoxin-a. Chemosphere 2021, 263, 127937. [DOI] [PubMed] [Google Scholar]
64.Jiang F; Wu W; Zhu Z; Zhu S; Wang H; Zhang L; Fan Z; Chen Y, Structure identification and toxicity evaluation of one newly-discovered dechlorinated photoproducts of chlorpyrifos. Chemosphere 2022, 301, 134822. [DOI] [PubMed] [Google Scholar]
65.Li X; Li T; Zhang P; Li X; Lu L; Sun Y; Zhang B; Allen S; White L; Phillips J; Zhu Z; Yao H; Xu J, Discovery of novel hybrids containing clioquinol-1-benzyl-1,2,3,6-tetrahydropyridine as multi-target-directed ligands (MTDLs) against Alzheimer’s disease. Eur J Med Chem 2022, 244, 114841. [DOI] [PubMed] [Google Scholar]
66.Vasincu IM; Apotrosoaei M; Constantin S; Butnaru M; Verestiuc L; Lupusoru CE; Buron F; Routier S; Lupascu D; Tauser RG; Profire L, New ibuprofen derivatives with thiazolidine-4-one scaffold with improved pharmaco-toxicological profile. BMC Pharmacol Toxicol 2021, 22 (1), 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Ahmadu PU; Victor E; Ameh FS, Studies on some neuropharmacological properties of Nevirapine in mice. IBRO Neurosci Rep 2022, 12, 12–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Zhu Q; Yang Y; Lao Z; Zhong Y; Zhang K; Zhao S, Acute and chronic toxicity of deltamethrin, permethrin, and dihaloacetylated heterocyclic pyrethroids in mice. Pest Manag Sci 2020, 76 (12), 4210–4221. [DOI] [PubMed] [Google Scholar]
69.Li M; She X; Ou Y; Liu J; Yuan Z; Zhao QS, Design, synthesis and biological evaluation of a new class of Hsp90 inhibitors vibsanin C derivatives. Eur J Med Chem 2022, 244, 114844. [DOI] [PubMed] [Google Scholar]
70.Feng Y; Lu Y; Li J; Zhang H; Li Z; Feng H; Deng X; Liu D; Shi T; Jiang W; He Y; Zhang J; Wang Z, Design, synthesis and biological evaluation of novel o-aminobenzamide derivatives as potential anti-gastric cancer agents in vitro and in vivo. Eur J Med Chem 2022, 227, 113888. [DOI] [PubMed] [Google Scholar]
71.Yao H; Guo Q; Wang M; Wang R; Xu Z, Discovery of pyrazole N-aryl sulfonate: A novel and highly potent cyclooxygenase-2 (COX-2) selective inhibitors. Bioorg Med Chem 2021, 46, 116344. [DOI] [PubMed] [Google Scholar]
72.Liu T; Zhang Y; Liu J; Peng J; Jia X; Xiao Y; Zheng L; Dong Y, Evaluation of the Acute and Sub-Acute Oral Toxicity of Jaranol in Kunming Mice. Front Pharmacol 2022, 13, 903232. [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Serafim CAL; Araruna MEC; Alves EB Junior; Silva LMO; Silva AO; da Silva MS; Alves AF; Araujo AA; Batista LM, (–)-Carveol Prevents Gastric Ulcers via Cytoprotective, Antioxidant, Antisecretory and Immunoregulatory Mechanisms in Animal Models. Front Pharmacol 2021, 12, 736829. [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Alves EB Junior; de Oliveira Formiga R; de Lima Serafim CA; Cristina Araruna ME; de Souza Pessoa ML; Vasconcelos RC; de Carvalho TG; de Jesus TG; Araujo AA; de Araujo RF Junior; Vieira GC; Sobral MV; Batista LM, Estragole prevents gastric ulcers via cytoprotective, antioxidant and immunoregulatory mechanisms in animal models. Biomed Pharmacother 2020, 130, 110578. [DOI] [PubMed] [Google Scholar]
75.McInnes L; Healy J; Melville J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 2018.
76.Blay V; Li X; Gerlach J; Urbina F; Ekins S, Combining DELs and machine learning for toxicology prediction. Drug Discov Today 2022, 27 (11), 103351. [DOI] [PMC free article] [PubMed] [Google Scholar]
77.Tetko IV; Bruneau P; Mewes HW; Rohrer DC; Poda GI, Can we estimate the accuracy of ADME-Tox predictions? Drug Discov Today 2006, 11 (15–16), 700–7. [DOI] [PubMed] [Google Scholar]
78.Nikolova-Jeliazkova N; Jaworska J, An approach to determining applicability domains for QSAR group contribution models: an analysis of SRC KOWWIN. Altern Lab Anim 2005, 33 (5), 461–70. [DOI] [PubMed] [Google Scholar]
79.Jaworska J; Nikolova-Jeliazkova N; Aldenberg T, QSAR applicabilty domain estimation by projection of the training set descriptor space: a review. Altern Lab Anim 2005, 33 (5), 445–59. [DOI] [PubMed] [Google Scholar]
80.Urbina F; Lentzos F; Invernizzi C; Ekins S, Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence 2022, 4 (3), 189–191. [DOI] [PMC free article] [PubMed] [Google Scholar]
81.Urbina F; Lentzos F; Invernizzi C; Ekins S, A teachable moment for dual use. Nature Machine Intelligence 2022, 4, 607. [DOI] [PMC free article] [PubMed] [Google Scholar]
82.Urbina F; Lentzos F; Invernizzi C; Ekins S, AI in drug discovery: A wake-up call. Drug Discov Today 2022, 28 (1), 103410. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplementary material

NIHMS1904611-supplement-supplementary_material.docx^{(4.6MB, docx)}

[R1] 1.Strickland J; Clippinger AJ; Brown J; Allen D; Jacobs A; Matheson J; Lowit A; Reinke EN; Johnson MS; Quinn MJ Jr.; Mattie D; Fitzpatrick SC; Ahir S; Kleinstreuer N; Casey W, Status of acute systemic toxicity testing requirements and data uses by U.S. regulatory agencies. Regul Toxicol Pharmacol 2018, 94, 183–196. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Morris-Schaffer K; McCoy MJ, A Review of the LD50 and Its Current Role in Hazard Communication. ACS Chemical Health & Safety 2021, 28 (1), 25–33. [Google Scholar]

[R3] 3.Anon OECD GUIDELINE FOR TESTING OF CHEMICALS. https://ntp.niehs.nih.gov/iccvam/suppdocs/feddocs/oecd/oecd_gl425-508.pdf (accessed Sept 29th 2020).

[R4] 4.Walum E, Acute oral toxicity. Environ Health Perspect 1998, 106 Suppl 2, 497–503. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Wang L; Ding J; Shi P; Fu L; Pan L; Tian J; Cao D; Jiang H; Ding X, Ensemble machine learning to evaluate the in vivo acute oral toxicity and in vitro human acetylcholinesterase inhibitory activity of organophosphates. Arch Toxicol 2021, 95 (7), 2443–2457. [DOI] [PubMed] [Google Scholar]

[R6] 6.Zainzinger V, Can Europe replace animal testing of chemicals?. Chemical and Engineering News 2022, 100. https://cen.acs.org/biological-chemistry/toxicology/Europe-replace-animal-testing-chemicals/100/i28 (accessed Feb 1st 2022). [Google Scholar]

[R7] 7.OECD OECD principles for the validation, for regulatory purposes, of (quantitative) structure-activity relationship models https://www.oecd.org/chemicalsafety/risk-assessment/37849783.pdf (accessed Sept 29th 2020).

[R8] 8.Stucki AO; Barton-Maclaren TS; Bhuller Y; Henriquez JE; Henry TR; Hirn C; Miller-Holt J; Nagy EG; Perron MM; Ratzlaff DE; Stedeford TJ; Clippinger AJ, Use of new approach methodologies (NAMs) to meet regulatory requirements for the assessment of industrial chemicals and pesticides for effects on human health. Front Toxicol 2022, 4, 964553. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Anon REACH. https://ec.europa.eu/environment/chemicals/reach/reach_en.htm (accessed Sept 29th 2020).

[R10] 10.EPA Endocrine Disruptor Screening Program Tier 1 Battery of Assays. https://www.epa.gov/endocrine-disruption/endocrine-disruptor-screening-program-tier-1-battery-assays (accessed Sept 29th 2020).

[R11] 11.Anon ICCVAM, A Strategic Roadmap for Establishing New Approaches to Evaluate the Safety of Chemicals and Medical Products in the United States, 2018. . https://ntp.niehs.nih.gov/whatwestudy/niceatm/natl-strategy/index.html?utm_source=direct&utm_medium=prod&utm_campaign=ntpgolinks&utm_term=natl-strategy (accessed Sept 29th 2020).

[R12] 12.Zakarya D; Larfaoui EM; Boulaamail A; Lakhlifi T, Analysis of structure-toxicity relationships for a series of amide herbicides using statistical methods and neural network. SAR QSAR Environ Res 1996, 5 (4), 269–79. [DOI] [PubMed] [Google Scholar]

[R13] 13.Eldred DV; Jurs PC, Prediction of acute mammalian toxicity of organophosphorus pesticide compounds from molecular structure. SAR QSAR Environ Res 1999, 10 (2–3), 75–99. [DOI] [PubMed] [Google Scholar]

[R14] 14.Enslein K, A toxicity estimation model. J Environ Pathol Toxicol 1978, 2 (1), 115–21. [PubMed] [Google Scholar]

[R15] 15.Enslein K; Lander TR; Tomb ME; Craig PN, A predictive model for estimating rat oral LD50 values. Toxicol Ind Health 1989, 5 (2), 261–387. [DOI] [PubMed] [Google Scholar]

[R16] 16.Martin TM; Lilavois CR; Barron MG, Prediction of pesticide acute toxicity using two-dimensional chemical descriptors and target species classification. SAR QSAR Environ Res 2017, 28 (6), 525–539. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Sazonovas A; Japertas P; Didziapetris R, Estimation of reliability of predictions and model applicability domain evaluation in the analysis of acute toxicity (LD50). SAR QSAR Environ Res 2010, 21 (1), 127–48. [DOI] [PubMed] [Google Scholar]

[R18] 18.Lagunin A; Zakharov A; Filimonov D; Poroikov V, QSAR Modelling of Rat Acute Toxicity on the Basis of PASS Prediction. Mol Inform 2011, 30 (2–3), 241–50. [DOI] [PubMed] [Google Scholar]

[R19] 19.Hamadache M; Benkortbi O; Hanini S; Amrane A; Khaouane L; Si Moussa C, A Quantitative Structure Activity Relationship for acute oral toxicity of pesticides on rats: Validation, domain of application and prediction. J Hazard Mater 2016, 303, 28–40. [DOI] [PubMed] [Google Scholar]

[R20] 20.Xu Y; Pei J; Lai L, Deep Learning Based Regression and Multiclass Models for Acute Oral Toxicity Prediction with Automatic Chemical Feature Extraction. J Chem Inf Model 2017, 57 (11), 2672–2685. [DOI] [PubMed] [Google Scholar]

[R21] 21.Lei T; Li Y; Song Y; Li D; Sun H; Hou T, ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling. J Cheminform 2016, 8, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Zhu H; Martin TM; Ye L; Sedykh A; Young DM; Tropsha A, Quantitative structure-activity relationship modeling of rat acute toxicity by oral exposure. Chem Res Toxicol 2009, 22 (12), 1913–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Zhu H; Ye L; Richard A; Golbraikh A; Wright FA; Rusyn I; Tropsha A, A novel two-step hierarchical quantitative structure-activity relationship modeling work flow for predicting acute toxicity of chemicals in rodents. Environ Health Perspect 2009, 117 (8), 1257–64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Li X; Chen L; Cheng F; Wu Z; Bian H; Xu C; Li W; Liu G; Shen X; Tang Y, In silico prediction of chemical acute oral toxicity using multi-classification methods. J Chem Inf Model 2014, 54 (4), 1061–9. [DOI] [PubMed] [Google Scholar]

[R25] 25.Kleinstreuer NC; Karmaus A; Mansouri K; Allen DG; Fitzpatrick JM; Patlewicz G, Predictive Models for Acute Oral Systemic Toxicity: A Workshop to Bridge the Gap from Research to Regulation. Comput Toxicol 2018, 8 (11), 21–24. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Mansouri K; Karmaus A; Fitzpatrick J; Patlewicz G; Pradeep P; Alberga D; Alepee N; Allen TEH; Allen D; Alves VM; Andrade CH; Auernhammer TR; Ballabio D; Bell S; Benfenati E; Bhattacharya S; Bastos JV; Boyd S; Brown JB; Capuzzi SJ; Chushak Y; Ciallella H; Clark AM; Consonni V; Daga PR; Ekins S; Farag S; Fedorov M; Fourches D; Gadaleta D; Gao F; Gearhart JM; Goh G; Goodman JM; Grisoni F; Grulke CM; Hartung T; Hirn M; Karpov P; Korotcov A; Lavado GJ; Lawless M; Li X; Luechtefeld T; Lunghini F; Mangiatordi GF; Marcou G; Marsh D; Martin T; Mauri A; Muratov EN; Myatt GJ; Nguyen DT; Nicolotti O; Note R; Pande P; Parks AK; Peryea T; Polash A; Rallo R; Roncaglioni A; Rowlands C; Ruiz P; Russo D; Sayed A; Sayre R; Sheils T; Siegel C; Silva AC; Simeonov A; Sosnin S; Southall N; Strickland J; Tang Y; Teppen B; Tetko IV; Thomas D; Tkachenko V; Todeschini R; Toma C; Tripodi I; Trisciuzzi D; Tropsha A; Varnek A; Vukovic K; Wang Z; Wang L; Waters KM; Wedlake AJ; Wijeyesakere SJ; Wilson D; Xiao Z; Yang H; Zahoranszky-Kohalmi G; Zakharov AV; Zhang FF; Zhang Z; Zhao T; Zhu H; Zorn KM; Casey W; Kleinstreuer NC, Erratum: CATMoS: Collaborative Acute Toxicity Modeling Suite. Environ Health Perspect 2021, 129 (7), 79001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Mansouri K; Karmaus AL; Fitzpatrick J; Patlewicz G; Pradeep P; Alberga D; Alepee N; Allen TEH; Allen D; Alves VM; Andrade CH; Auernhammer TR; Ballabio D; Bell S; Benfenati E; Bhattacharya S; Bastos JV; Boyd S; Brown JB; Capuzzi SJ; Chushak Y; Ciallella H; Clark AM; Consonni V; Daga PR; Ekins S; Farag S; Fedorov M; Fourches D; Gadaleta D; Gao F; Gearhart JM; Goh G; Goodman JM; Grisoni F; Grulke CM; Hartung T; Hirn M; Karpov P; Korotcov A; Lavado GJ; Lawless M; Li X; Luechtefeld T; Lunghini F; Mangiatordi GF; Marcou G; Marsh D; Martin T; Mauri A; Muratov EN; Myatt GJ; Nguyen DT; Nicolotti O; Note R; Pande P; Parks AK; Peryea T; Polash AH; Rallo R; Roncaglioni A; Rowlands C; Ruiz P; Russo DP; Sayed A; Sayre R; Sheils T; Siegel C; Silva AC; Simeonov A; Sosnin S; Southall N; Strickland J; Tang Y; Teppen B; Tetko IV; Thomas D; Tkachenko V; Todeschini R; Toma C; Tripodi I; Trisciuzzi D; Tropsha A; Varnek A; Vukovic K; Wang Z; Wang L; Waters KM; Wedlake AJ; Wijeyesakere SJ; Wilson D; Xiao Z; Yang H; Zahoranszky-Kohalmi G; Zakharov AV; Zhang FF; Zhang Z; Zhao T; Zhu H; Zorn KM; Casey W; Kleinstreuer NC, CATMoS: Collaborative Acute Toxicity Modeling Suite. Environ Health Perspect 2021, 129 (4), 47013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Alberga D; Trisciuzzi D; Mansouri K; Mangiatordi GF; Nicolotti O, Prediction of Acute Oral Systemic Toxicity Using a Multifingerprint Similarity Approach. Toxicol Sci 2019, 167 (2), 484–495. [DOI] [PubMed] [Google Scholar]

[R29] 29.Gadaleta D; Vuković K; Toma C; Lavado GJ; Karmaus AL; Mansouri K; Kleinstreuer NC; Benfenati E; Roncaglioni A, SAR and QSAR modeling of a large collection of LD50 rat acute oral toxicity data. Journal of Cheminformatics 2019, 11 (1), 58. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Graham JC; Rodas M; Hillegass J; Schulze G, The performance, reliability and potential application of in silico models for predicting the acute oral toxicity of pharmaceutical compounds. Regul Toxicol Pharmacol 2021, 119, 104816. [DOI] [PubMed] [Google Scholar]

[R31] 31.Minerali E; Foil DH; Zorn KM; Ekins S, Evaluation of Assay Central^® Machine Learning Models for Rat Acute Oral Toxicity Prediction. ACS Sustain Chem Eng 2020, 8, 16020–16027. [Google Scholar]

[R32] 32.Zorn KM; Lane TR; Russo DP; Clark AM; Makarov V; Ekins S, Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets. Mol Pharm 2019, 16 (4), 1620–1632. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Russo DP; Zorn KM; Clark AM; Zhu H; Ekins S, Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction. Mol Pharm 2018, 15 (10), 4361–4370. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Lane T; Russo DP; Zorn KM; Clark AM; Korotcov A; Tkachenko V; Reynolds RC; Perryman AL; Freundlich JS; Ekins S, Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Mol Pharm 2018, 15 (10), 4346–4360. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Lunghini F; Marcou G; Azam P; Horvath D; Patoux R; Van Miert E; Varnek A, Consensus models to predict oral rat acute toxicity and validation on a dataset coming from the industrial context. SAR QSAR Environ Res 2019, 30 (12), 879–897. [DOI] [PubMed] [Google Scholar]

[R36] 36.Polishchuk P; Tinkov O; Khristova T; Ognichenko L; Kosinskaya A; Varnek A; Kuz’min V, Structural and Physico-Chemical Interpretation (SPCI) of QSAR Models and Its Comparison with Matched Molecular Pair Analysis. J Chem Inf Model 2016, 56 (8), 1455–69. [DOI] [PubMed] [Google Scholar]

[R37] 37.Lu J; Peng J; Wang J; Shen Q; Bi Y; Gong L; Zheng M; Luo X; Zhu W; Jiang H; Chen K, Estimation of acute oral toxicity in rat using local lazy learning. J Cheminform 2014, 6, 26. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Fan T; Sun G; Zhao L; Cui X; Zhong R, QSAR and Classification Study on Prediction of Acute Oral Toxicity of N-Nitroso Compounds. Int J Mol Sci 2018, 19 (10). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Vukovic K; Gadaleta D; Benfenati E, Methodology of aiQSAR: a group-specific approach to QSAR modelling. J Cheminform 2019, 11 (1), 27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Hao Y; Sun G; Fan T; Tang X; Zhang J; Liu Y; Zhang N; Zhao L; Zhong R; Peng Y, In vivo toxicity of nitroaromatic compounds to rats: QSTR modelling and interspecies toxicity relationship with mouse. J Hazard Mater 2020, 399, 122981. [DOI] [PubMed] [Google Scholar]

[R41] 41.Russo DP; Strickland J; Karmaus AL; Wang W; Shende S; Hartung T; Aleksunes LM; Zhu H, Nonanimal Models for Acute Toxicity Evaluations: Applying Data-Driven Profiling and Read-Across. Environ Health Perspect 2019, 127 (4), 47001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Edwards SW; Nelms M; Hench VK; Ponder J; Sullivan K, Mapping Mechanistic Pathways of Acute Oral Systemic Toxicity Using Chemical Structure and Bioactivity Measurements. Front Toxicol 2022, 4, 824094. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Li X; Kleinstreuer NC; Fourches D, Hierarchical Quantitative Structure-Activity Relationship Modeling Approach for Integrating Binary, Multiclass, and Regression Models of Acute Oral Systemic Toxicity. Chem Res Toxicol 2020, 33 (2), 353–366. [DOI] [PubMed] [Google Scholar]

[R44] 44.Brodersen KH; Ong CS; Stephan KE; Buhman JM, The Balanced Accuracy and Its Posterior Distribution. In 2010 20th International Conference on Pattern Recognition, Instanbul, Turkey, 2010. [Google Scholar]

[R45] 45.Wang Y; Wang S; Feng XN; Yan LC; Zheng SS; Wang Y; Zhao YH, The impact of exposure route for class-based compounds: a comparative approach of lethal toxicity data in rodent models. Drug Chem Toxicol 2018, 41 (1), 95–104. [DOI] [PubMed] [Google Scholar]

[R46] 46.Lane TR; Urbina F; Rank L; Gerlach J; Riabova O; Lepioshkin A; Kazakova E; Vocat A; Tkachenko V; Cole S; Makarov V; Ekins S, Machine Learning Models for Mycobacterium tuberculosis In Vitro Activity: Prediction and Target Visualization. Mol Pharm 2022, 19 (2), 674–689. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] 47.EPA ECOTOX Knowledgebase. https://cfpub.epa.gov/ecotox/search.cfm (accessed Feb 1st 2022).

[R48] 48.EPA EPA Comptox dashboard. https://comptox.epa.gov/dashboard (accessed Feb 1st 2022).

[R49] 49.Williams AJ; Grulke CM; Edwards J; McEachran AD; Mansouri K; Baker NC; Patlewicz G; Shah I; Wambaugh JF; Judson RS; Richard AM, The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform 2017, 9 (1), 61. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] 50.Kim S; Chen J; Cheng T; Gindulyte A; He J; He S; Li Q; Shoemaker BA; Thiessen PA; Yu B; Zaslavsky L; Zhang J; Bolton EE, PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res 2021, 49 (D1), D1388–D1395. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] 51.Japan M.o. Results of Eco-toxicity tests of chemicals conducted by Ministry of the Environment in Japan. https://www.env.go.jp/chemi/sesaku/02e.pdf. (accessed Feb 1st 2022)

[R52] 52.Sheffield TY; Judson RS, Ensemble QSAR Modeling to Predict Multispecies Fish Toxicity Lethal Concentrations and Points of Departure. Environ Sci Technol 2019, 53 (21), 12793–12802. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] 53.Tinkov OV; Grigorev VY; Razdolsky AN; Grigoryeva LD; Dearden JC, Effect of the structural factors of organic compounds on the acute toxicity toward Daphnia magna. SAR QSAR Environ Res 2020, 31 (8), 615–641. [DOI] [PubMed] [Google Scholar]

[R54] 54.Aniceto N; Freitas AA; Bender A; Ghafourian T, A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood. Journal of Cheminformatics 2016, 8 (1), 69. [Google Scholar]

[R55] 55.Lundberg SM; Lee S-I In A Unified Approach to Interpreting Model Predictions, Advances in Neural Information Processing Systems, Guyon I; Von Luxberg U; Bengio S; Wallach H; Fergus R; Vishwanathan S; Garnett R, Eds. Curran Associates, Inc.: 2017. [Google Scholar]

[R56] 56.Rodríguez-Pérez R; Bajorath J, Interpretation of machine learning models using shapley values: application to compound potency and multi-target activity predictions. Journal of Computer-Aided Molecular Design 2020, 34 (10), 1013–1026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R57] 57.Murray JS; Finch SC; Puddick J; Rhodes LL; Harwood DT; van Ginkel R; Prinsep MR, Acute Toxicity of Gambierone and Quantitative Analysis of Gambierones Produced by Cohabitating Benthic Dinoflagellates. Toxins (Basel) 2021, 13 (5), 333. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R58] 58.Cui AL; Sun WF; Zhong ZJ; Jin J; Xue ST; Wu S; Li YH; Li ZR, Synthesis and Bioactivity of N-(4-Chlorophenyl)-4-Methoxy-3-(Methylamino) Benzamide as a Potential Anti-HBV Agent. Drug Des Devel Ther 2020, 14, 3723–3729. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] 59.Trenin AS; Isakova EB; Treshchalin MI; Polozkova VA; Mirchink EP; Panov AA; Simonov AY; Bychkova OP; Tatarskiy VV; Lavrenov SN, Evaluation of New Antimicrobial Agents Based on tris(1H-Indol-3-yl)methylium Salts: Activity, Toxicity, Suppression of Experimental Sepsis in Mice. Pharmaceuticals (Basel) 2022, 15 (2), 118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R60] 60.Lv Y; Liang H; Li J; Li X; Tang X; Gao S; Zou H; Zhang J; Wang M; Xiao L, Central inhibition prevents the in vivo acute toxicity of harmine in mice. J Toxicol Sci 2021, 46 (6), 289–301. [DOI] [PubMed] [Google Scholar]

[R61] 61.Huang D; Zhang M; Zhang H; Cui Z; Luo D; Li T; Li X; He Y; Zhang SL, Design and synthesis of TPP+-Mitomycin C conjugate with reduced toxicity. Bioorg Med Chem Lett 2022, 77, 129036. [DOI] [PubMed] [Google Scholar]

[R62] 62.Li YJ; Yang K; Long XM; Xiao G; Huang SJ; Zeng ZY; Liu ZY; Sun ZL, Toxicity assessment of gelsenicine and the search for effective antidotes. Hum Exp Toxicol 2022, 41, 9603271211062857. [DOI] [PubMed] [Google Scholar]

[R63] 63.Puddick J; van Ginkel R; Page CD; Murray JS; Greenhough HE; Bowater J; Selwood AI; Wood SA; Prinsep MR; Truman P; Munday R; Finch SC, Acute toxicity of dihydroanatoxin-a from Microcoleus autumnalis in comparison to anatoxin-a. Chemosphere 2021, 263, 127937. [DOI] [PubMed] [Google Scholar]

[R64] 64.Jiang F; Wu W; Zhu Z; Zhu S; Wang H; Zhang L; Fan Z; Chen Y, Structure identification and toxicity evaluation of one newly-discovered dechlorinated photoproducts of chlorpyrifos. Chemosphere 2022, 301, 134822. [DOI] [PubMed] [Google Scholar]

[R65] 65.Li X; Li T; Zhang P; Li X; Lu L; Sun Y; Zhang B; Allen S; White L; Phillips J; Zhu Z; Yao H; Xu J, Discovery of novel hybrids containing clioquinol-1-benzyl-1,2,3,6-tetrahydropyridine as multi-target-directed ligands (MTDLs) against Alzheimer’s disease. Eur J Med Chem 2022, 244, 114841. [DOI] [PubMed] [Google Scholar]

[R66] 66.Vasincu IM; Apotrosoaei M; Constantin S; Butnaru M; Verestiuc L; Lupusoru CE; Buron F; Routier S; Lupascu D; Tauser RG; Profire L, New ibuprofen derivatives with thiazolidine-4-one scaffold with improved pharmaco-toxicological profile. BMC Pharmacol Toxicol 2021, 22 (1), 10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R67] 67.Ahmadu PU; Victor E; Ameh FS, Studies on some neuropharmacological properties of Nevirapine in mice. IBRO Neurosci Rep 2022, 12, 12–19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R68] 68.Zhu Q; Yang Y; Lao Z; Zhong Y; Zhang K; Zhao S, Acute and chronic toxicity of deltamethrin, permethrin, and dihaloacetylated heterocyclic pyrethroids in mice. Pest Manag Sci 2020, 76 (12), 4210–4221. [DOI] [PubMed] [Google Scholar]

[R69] 69.Li M; She X; Ou Y; Liu J; Yuan Z; Zhao QS, Design, synthesis and biological evaluation of a new class of Hsp90 inhibitors vibsanin C derivatives. Eur J Med Chem 2022, 244, 114844. [DOI] [PubMed] [Google Scholar]

[R70] 70.Feng Y; Lu Y; Li J; Zhang H; Li Z; Feng H; Deng X; Liu D; Shi T; Jiang W; He Y; Zhang J; Wang Z, Design, synthesis and biological evaluation of novel o-aminobenzamide derivatives as potential anti-gastric cancer agents in vitro and in vivo. Eur J Med Chem 2022, 227, 113888. [DOI] [PubMed] [Google Scholar]

[R71] 71.Yao H; Guo Q; Wang M; Wang R; Xu Z, Discovery of pyrazole N-aryl sulfonate: A novel and highly potent cyclooxygenase-2 (COX-2) selective inhibitors. Bioorg Med Chem 2021, 46, 116344. [DOI] [PubMed] [Google Scholar]

[R72] 72.Liu T; Zhang Y; Liu J; Peng J; Jia X; Xiao Y; Zheng L; Dong Y, Evaluation of the Acute and Sub-Acute Oral Toxicity of Jaranol in Kunming Mice. Front Pharmacol 2022, 13, 903232. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R73] 73.Serafim CAL; Araruna MEC; Alves EB Junior; Silva LMO; Silva AO; da Silva MS; Alves AF; Araujo AA; Batista LM, (–)-Carveol Prevents Gastric Ulcers via Cytoprotective, Antioxidant, Antisecretory and Immunoregulatory Mechanisms in Animal Models. Front Pharmacol 2021, 12, 736829. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R74] 74.Alves EB Junior; de Oliveira Formiga R; de Lima Serafim CA; Cristina Araruna ME; de Souza Pessoa ML; Vasconcelos RC; de Carvalho TG; de Jesus TG; Araujo AA; de Araujo RF Junior; Vieira GC; Sobral MV; Batista LM, Estragole prevents gastric ulcers via cytoprotective, antioxidant and immunoregulatory mechanisms in animal models. Biomed Pharmacother 2020, 130, 110578. [DOI] [PubMed] [Google Scholar]

[R75] 75.McInnes L; Healy J; Melville J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 2018.

[R76] 76.Blay V; Li X; Gerlach J; Urbina F; Ekins S, Combining DELs and machine learning for toxicology prediction. Drug Discov Today 2022, 27 (11), 103351. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R77] 77.Tetko IV; Bruneau P; Mewes HW; Rohrer DC; Poda GI, Can we estimate the accuracy of ADME-Tox predictions? Drug Discov Today 2006, 11 (15–16), 700–7. [DOI] [PubMed] [Google Scholar]

[R78] 78.Nikolova-Jeliazkova N; Jaworska J, An approach to determining applicability domains for QSAR group contribution models: an analysis of SRC KOWWIN. Altern Lab Anim 2005, 33 (5), 461–70. [DOI] [PubMed] [Google Scholar]

[R79] 79.Jaworska J; Nikolova-Jeliazkova N; Aldenberg T, QSAR applicabilty domain estimation by projection of the training set descriptor space: a review. Altern Lab Anim 2005, 33 (5), 445–59. [DOI] [PubMed] [Google Scholar]

[R80] 80.Urbina F; Lentzos F; Invernizzi C; Ekins S, Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence 2022, 4 (3), 189–191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R81] 81.Urbina F; Lentzos F; Invernizzi C; Ekins S, A teachable moment for dual use. Nature Machine Intelligence 2022, 4, 607. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R82] 82.Urbina F; Lentzos F; Invernizzi C; Ekins S, AI in drug discovery: A wake-up call. Drug Discov Today 2022, 28 (1), 103410. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Comparing LD50/LC50 Machine Learning Models for Multiple Species

Thomas R Lane

Joshua Harris

Fabio Urbina

Sean Ekins

Abstract

Graphical Abstract

INTRODUCTION

EXPERIMENTAL SECTION

Data curation

Mouse

Table 1.

Rat

Fish

Daphnid

Machine learning

External test set

Table 2.

RESULTS

Rat

Figure 1.

Mouse

Figure 2.

Figure 3.

Comparison of mouse and rat

Figure 4.

Fish

Figure 5.

Daphnia

Figure 6.

Comparison of freshwater fish and daphnid

Figure 7.

Fish Chronic Toxicity

Figure 8.

Daphnia Chronic Toxicity

Figure 9.

Aquatic Toxicity regression models

Correlation between acute toxicity data for fish and rat

External test set for mouse LD50 models

Chemical space analysis of LD50 and LC50 datasets

Figure 10.

Figure 11.

DISCUSSION

Supplementary Material

ACKNOWLEDGMENTS

Grant information

ABBREVIATIONS USED

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Comparing LD₅₀/LC₅₀ Machine Learning Models for Multiple Species

External test set for mouse LD₅₀ models

Chemical space analysis of LD₅₀ and LC₅₀ datasets