Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2024 Apr 22;52(W1):W513–W520. doi: 10.1093/nar/gkae303

ProTox 3.0: a webserver for the prediction of toxicity of chemicals

Priyanka Banerjee 1,2,, Emanuel Kemmler 3,4, Mathias Dunkel 5, Robert Preissner 6
PMCID: PMC11223834  PMID: 38647086

Abstract

Interaction with chemicals, present in drugs, food, environments, and consumer goods, is an integral part of our everyday life. However, depending on the amount and duration, such interactions can also result in adverse effects. With the increase in computational methods, the in silico methods can offer significant benefits to both regulatory needs and requirements for risk assessments and the pharmaceutical industry to assess the safety profile of a chemical. Here, we present ProTox 3.0, which incorporates molecular similarity and machine-learning models for the prediction of 61 toxicity endpoints such as acute toxicity, organ toxicity, clinical toxicity, molecular-initiating events (MOE), adverse outcomes (Tox21) pathways, several other toxicological endpoints and toxicity off-targets. All the ProTox 3.0 models are validated on independent external sets and have shown strong performance. ProTox envisages itself as a complete, freely available computational platform for in silico toxicity prediction for toxicologists, regulatory agencies, computational chemists, and medicinal chemists. The ProTox 3.0 webserver is free and open to all users, and there is no login requirement and can be accessed via https://tox.charite.de. The web server takes a 2D chemical structure as input and reports the toxicological profile of the compound for each endpoint with a confidence score and overall toxicity radar plot and network plot.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

Toxicity evaluation is one of the main steps in drug and chemical design. Hence, there is a high demand for computational predictive models to evaluate the potential toxic effects of drugs and chemicals. The regulatory decision-making agencies such as the European Medicines Agency (EMA), and U.S. Food and Drug Administration (FDA), Environmental health protection agencies like the U.S. Environmental Protection Agency (EPA) and the European Environment Agency (EEA) to support the green chemistry paradigm and understand the importance of green toxicology (1). Important steps of green toxicology principles include- safer designs, and early testing to produce safer chemicals and make the testing process sustainable (2). Toxicity testing is a prerequisite for reducing the risk to humans and the environment. One of the important aspects of the green toxicology approach is reducing the number of animal tests and chemical usage during toxicity testing (3).

It has been estimated that an average of between 10 000 and 25 000 individual chemicals are considered in the course of the development of a new drug. Together, pre-clinical toxicity (animal) and adverse events (human toxicity) account for ∼1/3 of the cases of attrition of drug candidates (4).

Green Toxicology practices promote and encourage the use of new and innovative techniques and strategies that reduce the use of animals for testing, the amounts of chemicals used and disposed of during tests; and increase the consideration of toxicity in the synthesis, use and regulation of chemicals. Green chemistry focuses on the design of chemical products and processes that minimize or eliminate hazardous substances, reduce waste generation, and conserve energy and resources. By incorporating green chemistry principles into chemical synthesis and manufacturing, Green Toxicology contributes to the development of safer and more environmentally benign chemicals. Often, many more substances still under consideration must be assessed, and that quantities of substances and resources for testing are limited. These limitations call for the use of computational and higher-throughput in vitro approaches. Intelligent testing strategies, such as in silico modeling, high-throughput screening and structure–activity relationship analysis, enable efficient and cost-effective identification of potential toxicity concerns. By leveraging these predictive tools, researchers can prioritize safer alternatives and design products with reduced environmental and health impacts (3). Green Toxicology seamlessly integrates with the principles of Green Chemistry, forming a cohesive approach to sustainable chemical design and usage (5). By acknowledging and addressing the potential adverse outcomes and toxicity associated with chemical development, utilization, and disposal, Green Toxicology reinforces the overarching goals of Green Chemistry. Toxicological tools, including in silico modeling, omics technologies, and in vitro methods, play a pivotal role in enhancing our understanding of toxicity mechanisms and identifying structural–activity relationships (SAR) and associated effects. With the help of these tools, researchers can often identify and eliminate potentially harmful chemical candidates at an early stage, a concept often referred to as ‘failing early and failing cheaply’ (3). Through a shared emphasis on understanding and mitigating toxicity, Green Toxicology and Green Chemistry foster a holistic approach to sustainable chemical innovation (2,3).

Thus, in silico, toxicity is evolving as an integral platform for the prediction of the potential hazards posed by various chemical substances to humans, animals, plants and ecosystems (6). The aim of in silico toxicity models is to complement the existing in vitro toxicity methods to predict the toxicity effects of chemicals, thereby streamlining both time and financial resources allocated for toxicity assessments. In silico toxicity model incorporates knowledge from various fields such as toxicology, biostatistics, systems biology, computer science and many other relevant disciplines (5). In essence, in silico toxicity modeling represents a sophisticated and comprehensive approach to evaluating chemical safety, holding the promise to revolutionize how we assess and mitigate the risks inherent in chemical exposure.

A good prediction model's prerequisites depend on the performance balance between sensitivity and specificity. Sensitivity is the ability to predict all the toxic/true positives in the given dataset, whereas specificity is the ability to predict the non-toxic/true negatives correctly (6). ProTox 3.0 models have shown a balanced performance ratio considering the sensitivity and specificity of the respective models. All the 61 models reported in the ProTox 3.0 web server showed relatively better performance than commercial software like Discovery Studio's TOPKAT (Toxicity Prediction by Komputer Assisted Technology; Accelrys, Inc., USA) (http://accelrys.com/) as well as freely accessible tool like Toxicity Estimation Software Tools (T.E.S.T.) developed by the U.S. Environmental Protection Agency (https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test).

Additionally, the emergence of web servers, some of which offer partial access to the wider community, has brought significant benefits. The admetSAR platform, which houses a range of prediction models rooted (47 models) in QSAR methods (7). ProTox- previous versions of the web servers (8,9) have been highly cited and have been found useful in real-world validation(10,11). ProTox platform has been used as lecture module in several universities. The results and analysis obtained using the ProTox server are also regularly reported on the regulatory platform like European Chemicals Agency (ECHA) as additional reports on alternative methods to animal trials (12).

The ProTox 3.0 web server provides several advantages over existing computational models. ProTox webserver includes chemical, molecular target knowledge, metabolism, and the Adverse outcomes pathways (AOPs). As mentioned in our earlier update (ProTox-II) (8), a novelty of the ProTox webserver, in general, is that the prediction scheme is classified into different levels of toxicity such as oral toxicity, organ toxicity (hepatotoxicity, neurotoxicity, respiratory toxicity, cardiotoxicity, and nephrotoxicity), toxicological endpoints (such as mutagenicity, carcinogenicity, cytotoxicity, immunotoxicity, BBB-permeability, ecotoxicity, clinical toxicity and nutritional toxicity), 12 toxicological pathways (AOPs), 15 toxicity targets, 14 targets for molecular initiating events (MIEs), and six molecular targets for metabolism- thereby providing insights into the possible molecular mechanism behind such toxic response. The latest version, ProTox 3.0, incorporates molecular similarity, pharmacophore-based, fragment propensities, and machine learning models for the prediction of various toxicity endpoints. ProTox 3.0 consisting of 61 models, is a freely available computational toxicity prediction webserver enabling the prediction of the most toxicological endpoints to date.

ProTox 3.0 platform

Input information

The user-friendly interface of the ProTox 3.0 web server is self-informative and can be used with less or no help required. It is designed with a self-intuitive user perspective in mind. The user needs to specify either the name (PubChem name) or canonical SMILES (Simplified Molecular-Input Line Entry System) string of the input compound to run a prediction on the server. Furthermore, the user can also use the drawing box provided by the (https://www.chemdoodle.com/), to draw user -desired molecular structure. This supports the prediction of even a hypothetical compound before synthesizing it in the lab. Additionally, the user can select a single model/user-specific model or ALL models; if not specified by the user, the web server computes only acute toxicity and toxicity targets by default.

Output information

The result page of the predicted input compound includes information on the acute toxicity class (predicted median lethal dose (LD50) in mg/kg weight, toxicity class, and prediction accuracy), including the three structures of the three most similar compounds present in the dataset, and distribution of physicochemical properties of the input compound compared to other database molecules. This is followed by a table containing the prediction for the organ toxicity, toxicological endpoints, toxicological pathways, molecular initiating events, and metabolism with their respective predicted class and confidence score. The results page also includes the predicted toxicity targets with information on the target's name, average fit and similarity of the input compound concerning pharmacophore and known ligands. Users can specify a specific model of interest or all models for the prediction. The prediction results are also shown as a toxicity radar plot for active class prediction. The active and inactive classes' results are also displayed as a network plot for an input compound (Figure 1). The new server allows users to download the results in several file formats- csv, pdf, image files. More detailed information with an example compound output is available in the ‘FAQs’ section of the ProTox 3.0 webserver.

Figure 1.

Figure 1.

Application case: Using an example compound ‘methandrostenolone’ (withdrawn drug).

Materials and methods

The updated ProTox 3.0 platform is divided into seven different classification steps: (i) acute toxicity (oral toxicity model with six different toxicity classes); (ii) organ toxicity (five models); (iii) toxicological endpoints (8 models); (iv) toxicological pathways (12 models), (v) molecular initiating events (14 models), (vi) metabolism (six models) and (vii) toxicity targets (15 models). Here, we briefly describe each new model available on the ProTox 3.0 server and mention the already published models (8). Detailed information with references, performance scores and frequency distribution of the most common features present in the training set (both for toxic/positive class and non-toxic/negative class) molecule is available under model info on the ProTox 3.0 webserver. A complete description of the number of data sets used in this study is provided under ‘Model info’ of the server. There are currently a total of 61 models reported in the ProTox 3.0 update. A total of 28 new models have been added to the recent version, and the previous (33) models have also been updated based on the new server software specification.

The newly developed 28 models to the ProTox 3.0 webserver are based on Random Forest (RF) machine learning and Deep neural network algorithm and eight different data sampling methods (13). The advantage of using an RF-based classifier is that it avoids overfitting and performs better than the deep learning-based models. Two different molecular fingerprints are used: MACCS molecular fingerprints (MACCS Structural keys; Accelrys: San Diego, CA, 2011. http://accelrys.com/) and Morgan circular fingerprints (http://www.rdkit.org/) (14,15). However, modified MACCS fingerprints performed slightly better on most of the validation sets.

More information on the respective models, datasets, and sampling methods is available via the ‘Model info’ and ‘FAQs’ sections of the web server.

The prediction models are based on Python programming language. Machine learning packages like scikit-learn (http:/scikit-learn.org) and cheminformatics package RDKit (http://www.rdkit.org/) are used for the model implementation. All data are standardised using KNIME (29). A template script (sample API script https://comptox.charite.de/protox3/protox3_api.py) has been provided under the description ‘using the API’ on the FAQ section of the ProTox 3.0 webserver. ProTox-3.0 data is stored in a relational MySQL database. The MyChem package is used to handle the chemical information within the database. For most of its functions, MyChem relies on the Open Babel toolbox. The website backend is built using PHP; web access is enabled via the Apache HTTP Server. Redis is an agile key/value store for queueing and assessing API requests.

All the new models are validated using 10-fold cross-validation. The data was divided accordingly by 8 different sampling methods, ensuring the active and inactive ratios were constant into 10 sets, 9 of which were used to train the model and 10th to validate the model. The best parameters for the classifier were found through the RandomizedSearchCV function of the sklearn library. Additionally, an external set (unknown to the model) was used for external validation of the models. All the models are assessed on the following performance—accuracy, the area under the curve (AUC) of a receiver operating characteristic (ROC), sensitivity, specificity and F1 measures the quality of binary classification models. The F1 ranges between 0 (less significant) to 1 (perfect) (16,17). A total of 224 models for 28 endpoints were constructed (8 models for single endpoint), and the best-performing model for respective endpoints was selected as the final model, saved with the module pickle and implemented on the ProTox 3.0 server.

Acute toxicity

Oral toxicity

The acute toxicity model is constructed by analysing chemical similarities among compounds with known toxic effects and identifying specific toxic fragments. This methodology was elaborated on in our prior publication (9). To populate the acute toxicity model, we have extracted relevant data from the updated version of our in-house database, SuperToxic (18).

Toxicity targets

Toxicity target prediction lies in the analysis of 15 distinct targets sourced from the renowned Novartis in vitro safety panels (19). These specific protein targets are intricately linked to adverse drug reactions, the methods are extensively discussed in our earlier publication (8).

Organ toxicity

Hepatotoxicity

Drug-induced liver injury model is based on the method reported in our previous publication (8,13)

Neurotoxicity

Drug-induced neurotoxicity can result in severe complications in the normal functioning of the brain, leading to various neurological symptoms or disorders (12). The central nervous system, including the brain and spinal cord, is particularly vulnerable to the effects of neurotoxic substances. Computational and machine learning models can potentially provide more efficient and cost-effective ways to identify neurotoxicity associated with clinical drugs. Data on known neurotoxic and non-neurotoxic drugs is crucial for training accurate models and was curated from the literature (12,13,20). SMOTEVDM—the data sampling method (13) was used to construct the model. The ProTox 3.0 prediction model has a balanced accuracy of 86.00% on the cross-validation set and 80.00% on an external validation set. The AUC-ROC scores of cross-validations and external validation are 0.87 and 0.91, respectively (Tables 1 and 2).

Table 1.

Performance evaluation on the cross-validation set for the newly included models in the ProTox 3.0 platform in terms of accuracy, sensitivity, specificity and AUC-ROC

Models (cross-validation) Accuracy (%) AUC-ROC Sensitivity (%) Specificity (%)
Organ toxicity
Neurotoxicity 86.00 0.87 82.80 86.90
Nephrotoxicity 83.00 0.86 78.70 86.80
Cardiotoxicity 87.00 0.86 86.00 95.10
Respiratory toxicity 91.80 0.98 90.00 87.50
Toxicological endpoints
Blood–brain barrier (BBB) 98.00 0.96 92.00 97.00
Ecotoxicity 90.00 0.95 92.00 93..00
Clinical toxicity 86.00 0.84 82.40 79.50
Nutritional toxicity 83.80 0.84 80.11 90.00
Molecular initiating events (MIE)
THRα 94.20 0.97 95.70 92.80
THRβ 95.00 0.97 90.90 95.40
TTR 86.00 0.93 86.60 85.30
RYR 97.00 0.98 94.80 96.80
GABAR 89.20 0.95 86.00 92.30
NMDAR 98.00 0.99 98.00 97.50
AMPAR 97.00 0.95 98.70 96.00
KAR 96.00 0.94 90.00 98.00
AChE 92.60 0.95 93.50 89.90
CAR 94.00 0.98 91.90 98.00
PXR 81.30 0.90 78.90 85.50
NADHOX 96.90 0.95 96.70 97.10
VGSC 91.40 0.95 91.50 93.00
NIS 99.00 0.98 98.00 99.00
Metabolism
CYP1A2 95.00 0.97 99.00 97.00
CYP2C19 97.00 0.97 99.00 94.00
CYP2C9 97.00 0.98 97.00 96.00
CYP2D6 84.00 0.92 88.00 86.00
CYP3A4 92.00 0.96 93.00 92.00
CYP2E1 90.00 0.95 98.00 97.00
Table 2.

Performance evaluation on external validation for the newly included models in the ProTox 3.0 platform in terms of Accuracy, Sensitivity, Specificity and AUC-ROC

Models (external validation) Accuracy (%) AUC-ROC Sensitivity (%) Specificity (%)
Organ toxicity
Neurotoxicity 80.00 0.91 83.10 78.88
Nephrotoxicity 78.00 0.89 83.30 78.90
Cardiotoxicity 89.00 0.95 87.00 88.20
Respiratory toxicity 88.90 0.93 88.00 85.90
Toxicological endpoints
Blood–brain barrier (BBB) 97.10 0.91 96.00 93.00
Ecotoxicity 81.24 0.85 80.00 81.00
Clinical toxicity 79.87 0.87 82.57 78.50
Nutritional toxicity 84.70 0.85 73.60 85.60
Molecular initiating events (MIE)
THRα 95.60 0.97 95.60 95.80
THRβ 89.50 0.91 70.80 96.20
TTR 85.70 0.94 86.00 85.00
RYR 98.00 0.97 96.00 99.10
GABAR 94.20 0.98 85.70 94.50
NMDAR 96.50 0.98 89.50 96.90
AMPAR 98.90 0.97 87.50 99.00
KAR 97.00 0.96 71.90 99.00
AChE 88.50 0.94 89.80 87.60
CAR 99.00 0.98 72.00 98.00
PXR 88.30 0.92 74.10 89.60
NADHOX 96.00 0.94 81.80 99.30
VGSC 97.10 0.98 87.50 97.60
NIS 99.70 0.99 97.89 98.00
Metabolism
CYP1A2 90.00 0.95 84.00 92.00
CYP2C19 95.00 0.87 95.00 95.00
CYP2C9 90.00 0.97 61.00 94.00
CYP2D6 80.00 0.85 80.00 79.00
CYP3A4 86.00 0.93 82.00 87.00
CYP2E1 89.00 0.90 78.00 97.00

Nephrotoxicity

Drug-induced nephrotoxicity can occur when the kidney is exposed to exogenous or endogenous toxicants. Exposure to nephrotoxic drugs can destabilize the kidney‘s function of maintaining homeostasis of the body. Nephrotoxicity can occur due to various mechanisms, such as crystal nephropathy, thrombotic microangiopathy, tubular cell toxicity, and inflammation (21). Computational and machine learning models can potentially provide more efficient and cost-effective ways to identify neurotoxicity associated with clinical drugs. Data on known nephrotoxic and non-nephrotoxic drugs is crucial for training accurate models and was curated from the literature (21). kMedoids2—the data sampling method (13) was used to construct the model. The ProTox 3.0 prediction model has a balanced accuracy of 83.00% on the cross-validation set and 78.00% on an external validation set. The AUC-ROC scores of cross-validations and external validation are 0.86 and 0.89, respectively (Tables 1 and 2).

Cardiotoxicity

Drug-induced cardiotoxicity can be caused by inhibiting the human ether-à-go-go-related gene (hERG) channel- a potassium ion channel that plays a crucial role in regulating the cardiac action potential (22,23). The hERG channel is essential for the repolarization of the cardiac cells, and its inhibition by certain small molecules can lead to the prolongation of the QT interval on an electrocardiogram (ECG). The ProTox-3.0 cardiotoxicity model predicts small molecules-based hERG blockers (22). Data on cardiotoxic and non-cardiotoxic chemical compounds a dataset of 5252 compounds with hERG inhibition values (IC50 and Ki) was obtained from the ChEMBL database (24). SMOTETC- the data sampling method (13) was used to construct the model. The ProTox 3.0 prediction model has a balanced accuracy of 87.00% on the cross-validation set and 89.00% on an external validation set. The AUC-ROC scores of cross-validations and external validation are 0.86 and 0.95, respectively (Tables 1 and 2).

Respiratory toxicity

Drug-induced respiratory toxicity refers to adverse effects on the respiratory system caused by exposure to certain drugs or medications. Exposure to certain chemicals and drugs can result in side effects that can have a major impact on the respiratory system, leading to symptoms ranging from mild respiratory discomfort to severe respiratory distress. Data on known respiratory toxic and respiratory non-toxic drugs were curated using publicly available databases (25). The random oversampling method (randOS) (13) the data sampling method was used to construct the model. The ProTox 3.0 prediction model has an accuracy of 91.90% on the cross-validation set and 88.90% on an external validation set. The AUC-ROC scores of cross-validations and external validation are 0.98 and 0.93, respectively (Tables 1 and 2).

Toxicological endpoints

ProTox 3.0 webserver currently hosts eight models under the toxicological endpoints category.

Four models, namely carcinogenicity, mutagenicity, cytotoxicity, and immunotoxicity models are already reported in our previous published studies (8). More information on the respective models, data points, feature importance, and performance matrix is available on the server via Model Info.

In the current update, newly developed models for predicting the blood-brain barrier (BBB) permeability, ecotoxicity, clinical toxicity, and nutritional toxicity have been added to the ProTox 3.0 prediction workflow.

Blood–brain barrier (BBB)

Some toxins, by size, lipophilicity, or specific transport mechanisms, can bypass or compromise the BBB and enter the brain (26). These toxicants may include environmental pollutants, drugs, and certain chemicals. Though clinical experiments are the most accurate methods to predict the BBB permeability of a molecule, they are, on the other hand, time-consuming and labour-intensive. Data sets on the positive and negative classes of the BBB permeability are crucial for training accurate models and were curated from the literature (27). SMOTETC—the data sampling method (13) was used to construct the model. The ProTox 3.0 prediction model has a balanced accuracy of 98.00% on the cross-validation set and 97.00% on an external validation set. The AUC-ROC scores of cross-validations and external validation are 0.96 and 0.91, respectively (Tables 1 and 2).

Ecotoxicity

The ecotoxicity model developed on the ProTox 3.0 platform can predict the acute toxicity of the extended chemical space domain on fish, daphnids, and algae with good great prediction accuracy. Data on known toxic and non-toxic compounds is crucial for training accurate models and was curated from the literature (28,29). SMOTEVDM—the data sampling method (13) was used to construct the model. The ProTox 3.0 prediction model has a balanced accuracy of 90.00% on the cross-validation set and 81.20% on an external validation set. The AUC-ROC scores of cross-validations and external validation are 0.95 and 0.85, respectively (Tables 1 and 2).

Clinical toxicity

The clinical toxicity model endpoint reported in this work is defined as whether or not a drug-like molecule failed clinical trials due to toxicity. The datasets were obtained from the ClinTox dataset (30). Data on known clinical toxic and clinical non-toxic drugs/ molecules were defined based on the presence of toxicity reports as withdrawal/failure. kMedoids2—the data sampling method (13) was used to construct the model. The ProTox 3.0 prediction model has a balanced accuracy of 86.00% on the cross-validation set and 79.00% on an external validation set. The AUC-ROC scores of cross-validations and external validation are 0.84 and 0.87, respectively (Tables 1 and 2).

Nutritional toxicity

High-quality toxicological data on food-relevant chemicals were obtained from publicly available databases (31,32) The data was curated on toxic and non-toxic chemicals present in food, and the final dataset was used for training and validation. kMedoids1- the data sampling method (13) was used to construct the model. The ProTox 3.0 predictions model has a balanced accuracy of 83.80% on the cross-validation set and 84.70% on an external validation set. The AUC-ROC scores of cross-validations and external validation are 0.84 and 0.85, respectively (Tables 1 and 2).

Toxicological pathways

Toxicology in the 21st Century (Tox21) platform, the US toxicology initiative which was started in 2008, provides a library of 10000 chemical data, screened in high-throughput assays against a panel of 12 different biological target-based pathways that involve two major groups of adverse outcome pathways (AOPs): the nuclear receptor pathway (seven target-pathway based models) and the stress response pathway (five target-pathway based models). The prediction of chemical compounds active in toxicological pathways is based on the Tox21 dataset (33), and the models are reposted in our previous studies (8).

Molecular initiating events (MIE)

Developmental and adult/ageing neurotoxicity needs alternative methods for chemical risk assessment. The formulation of a strategy to screen large numbers of chemicals is highly relevant due to potential exposure to compounds that may have long-term adverse health consequences on the nervous system, leading to neurodegeneration. Adverse outcome pathways (AOPs) provide information on relevant molecular initiating events (MIEs) and key events (KEs) that could inform the development of computational alternatives for these complex effects (34).

There are currently 14 MIEs target endpoints developed and hosted on the ProTox 3.0 server. More information on the respective models, data points, feature importance, and performance matrix is available on the server via Model Info.

Thyroid hormone receptor alpha (THRα), Thyroid hormone receptor beta (THRβ), Transtyretrin (TTR),

Ryanodine receptor (RYR), GABA receptor GABA receptor (GABAR), Glutamate N-methyl-D-aspartate receptor (NMDAR), alpha-amino-3-hydroxy-5-methyl-4-isoxazolepropionate receptor (AMPAR), Kainate receptor (KAR), Acetylcholinesterase (AChE), Constitutive androstane receptor (CAR), Pregnane X receptor (PXR), NADH-quinone oxidoreductase (NADHOX), Voltage-gated sodium channel (VGSC), Na+/I symporter (NIS). All the models have a balanced accuracy of more than 80% and AUC-ROC values within the range of 0.80–0.90 for cross-validation and external validation (see Tables 1 and 2).

Metabolism

Phase I metabolism can increase or decrease the activity of a drug, and some of the metabolites formed may have pharmacological activity themselves. Six major CYP isoforms, including cytochrome CYP1A2, cytochrome CYP2C19, cytochrome CYP2C9, cytochrome CYP2D6, cytochrome CYP3A4, cytochrome CYP2E1, that are responsible for >90% of the metabolism of clinical drugs are included in this version of the web server. Datasets have been curated from different literature and publicly available databases (35). More information on the respective models, data points, feature importance, and performance matrix is available on the server viaModel Info.

All the models have a balanced accuracy of >80% and AUC-ROC values within the range of 0.85–0.98 for cross-validation and external validation (see Tables 1 and 2).

Application case

To illustrate the functionalities and possible application of the ProTox 3.0 web server, methandrostenolone was considered as the input compound. Methandrostenolone is an orally active anabolic androgenic steroid. It was introduced to the market in 1960. Later, was withdrawn in the year 1982 from different countries—France, Germany, UK, USA; reasons for withdrawal were cited as off-label abuse (36).

Using our ProTox 3.0 prediction pipeline, Methandrostenolone has been predicted with toxicity class 4 for acute oral toxicity with LD50 value of 1000 mg/kg, with a prediction accuracy of 100.00%. Three structurally similar compounds and their physicochemical properties distribution plots are reported from the database (see Figure 1). The drug was predicted to be active for liver toxicity, neurotoxicity, and respiratory toxicity. It was also predicted to be active for the BBB permeability, immunotoxic, and clinically toxic under the toxicological endpoints class. Four toxicological pathways—NR-AR, NR-AR-LBD, NR-ER and NR-ER-LBD—were predicted to be active with a high probability of 1.0. One MIEs endpoint—AcHE was predicted as active for this drug. The metabolic enzyme CYP2C9 was predicted to be active for the drug (as reported in Figure 1). Four different toxicity targets (androgen receptor, amine oxidase A, glucocorticoid receptor and progesterone receptor A) are predicted with probable binding.

Conclusions and future updates

Although the field of prediction of chemical toxicities is challenging, it can also be viewed as one of the great opportunities, both in terms of basic science and practical applications (37). Computational approaches can, on the other hand, provide faster toxicity prediction with great prediction accuracy. The possibility of entering a chemical structure into a piece of software to obtain the same information as from an in vivo test is immensely advantageous. We have demonstrated the advance of using different sampling methods to predict the toxicity of 28 new models implemented in the updates of the ProTox 3.0 server. Our results show that sampling methods and RF-based models performed better on cross-validations and external validation sets (See Tables 1 and 2). Here, we have developed a ML-based pipeline with the aim of increasing the accuracy, explainability and applicability of the toxicity predictions for multiple endpoints, taking advantage of the available in vitro, in vivo data and advanced data sampling methods. Two different molecular representations were tested, Morgan fingerprints and pre-trained maximum common feature (MCF) fingerprints using features derived from MACCS fingerprints. Comparison of the ProTox 3.0 models to that of the published models in literature is provided in the supporting document (Supplementary Table S1). The goal of the ProTox platform is to help users predict chemical toxicity through a computational modelling pipeline and to be considered as one of the alternatives for reducing animal testing and tool for green toxicology applications. The web server is easy to use and favours toxicologists, experimental scientists, and researchers in drug discovery and safer chemical designs. The testimony of the good prediction performance of all the models implemented in the ProTox server is the high performance balanced between sensitivity and specificity. ProTox- web server can help scientific users use the computational screening pipeline to assess the chemical toxicity of large libraries of compounds for 61 different toxicity endpoints. The results can be analysed meaningfully as reported on the result page and can further give insights into the molecular mechanism behind such toxic responses associated with a compound. This classification of endpoints can also help medicinal chemists understand the relationship between structure and chemical toxicity; this information can later be used to guide the structure optimization of lead compounds. The innovative protocol presented in the ProTox- classification method can help explore the toxicity mechanism. Moving forward, collecting more data on AOPs will also be essential to allow a mechanistic understanding of drug action by considering targets in the context of biological networks. The synergy between network pharmacology approaches and ML methods could be a potential way of addressing this problem in future.

The current version of ProTox incorporates structural alerts and toxicophores into the models, but there is a need for improvement in visualizing this information in a meaningful manner within the molecules. In our future updates, we are committed to enhancing the visualization aspects of structural alerts and toxic fragments present in our datasets. We acknowledge the significance of visual representations in facilitating a deeper understanding of the data and its implications. By incorporating more robust visualization tools, our aim is to provide users with clearer and more intuitive insights into the structural alerts and toxic fragments identified in our datasets. We believe that enhancing the visualization aspects will greatly benefit users in interpreting and utilizing the information provided by ProTox effectively.

ProTox webserver plays a crucial role in democratizing access to sophisticated prediction tools that were previously confined to specialized software. By offering freely accessible platform, it has been already and also aims to continue open doors for researchers to leverage advanced computational techniques in fields such as drug discovery, chemical safety, cosmetics, food safety, and toxicity prediction. In essence, the availability of platforms like ProTox 3.0 marks a pivotal step towards inclusivity in computational modeling, empowering researchers from diverse backgrounds to contribute to scientific advancements.

Supplementary Material

gkae303_Supplemental_File

Acknowledgements

We thank the students of the Structural Bioinformatics Group at Charité for testing the ProTox 3.0 webserver.

Notes

Present address: Priyanka Banerjee, Institute of Physiology, Charité-University Medicine Berlin, Philippstrasse 12, Berlin, 10115 Berlin, Germany.

Contributor Information

Priyanka Banerjee, Institute for Physiology & Science-IT, Charité – University Medicine Berlin, 10115 Berlin, Germany; Member of the KFO 339: Food Allergy and Tolerance (Food@), Clinical Research Unit funded by the German Research Foundation, Berlin, Germany.

Emanuel Kemmler, Institute for Physiology & Science-IT, Charité – University Medicine Berlin, 10115 Berlin, Germany; Member of the KFO 339: Food Allergy and Tolerance (Food@), Clinical Research Unit funded by the German Research Foundation, Berlin, Germany.

Mathias Dunkel, Institute for Physiology & Science-IT, Charité – University Medicine Berlin, 10115 Berlin, Germany.

Robert Preissner, Institute for Physiology & Science-IT, Charité – University Medicine Berlin, 10115 Berlin, Germany.

Data availability

The ProTox 3.0 web-server is free and open to all users, and there is no login requirement and can be accessed via https://tox.charite.de.

Supplementary data

Supplementary Data are available at NAR Online.

Funding

Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) as part of the clinical research unit [CRU339]: Food allergy and tolerance [FOOD@) – 428445448, 428447634]; German Research Foundation (DFG). Funding for open access charge: Charité - University Medicine Berlin.

Conflict of interest statement. None declared.

References

  • 1. Zhang  L., McHale  C.M., Greene  N., Snyder  R.D., Rich  I.N., Aardema  M.J., Roy  S., Pfuhler  S., Venkatactahalam  S.  Emerging approaches in predictive toxicology. Environ. Mol. Mutagen.  2014; 55:679–688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Maertens  A., Anastas  N., Spencer  P.J., Stephens  M., Goldberg  A., Hartung  T.  Green toxicology. ALTEX. 2014; 31:243–249. [DOI] [PubMed] [Google Scholar]
  • 3. Crawford  S.E., Hartung  T., Hollert  H., Mathes  B., van Ravenzwaay  B., Steger-Hartmann  T., Studer  C., Krug  H.F.  Green Toxicology: a strategy for sustainable chemical and material development. Environ. Sci. Eur.  2017; 29:16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Guengerich  F.P.  Mechanisms of drug toxicity and relevance to pharmaceutical development. Drug Metab. Pharmacokinet.  2011; 26:3–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Hardy  B., Douglas  N., Helma  C., Rautenberg  M., Jeliazkova  N., Jeliazkov  V., Nikolova  I., Benigni  R., Tcheremenskaia  O., Kramer  S.  et al.  Collaborative development of predictive toxicology applications. J. Cheminform.  2010; 2:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Raies  A.B., Bajic  V.B.  In silico toxicology: computational methods for the prediction of chemical toxicity. Wiley Interdiscip. Rev. Comput. Mol. Sci.  2016; 6:147–172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Yang  H., Lou  C., Sun  L., Li  J., Cai  Y., Wang  Z., Li  W., Liu  G., Tang  Y.  admetSAR 2.0: web-service for prediction and optimization of chemical ADMET properties. Bioinformatics. 2019; 35:1067–1069. [DOI] [PubMed] [Google Scholar]
  • 8. Banerjee  P., Eckert  A.O., Schrey  A.K., Preissner  R.  ProTox-II: a webserver for the prediction of toxicity of chemicals. Nucleic Acids Res.  2018; 46:W257–W263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Drwal  M.N., Banerjee  P., Dunkel  M., Wettig  M.R., Preissner  R.  ProTox: a web server for the in silico prediction of rodent oral toxicity. Nucleic Acids Res.  2014; 42:W53–W58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Banerjee  P., Ulker  O.C.  Combinative ex vivo studies and in silico models ProTox-II for investigating the toxicity of chemicals used mainly in cosmetic products. Toxicol. Mech. Methods. 2022; 32:542–548. [DOI] [PubMed] [Google Scholar]
  • 11. Arulanandam  C.D., Hwang  J.-S., Rathinam  A.J., Dahms  H.-U.  Evaluating different web applications to assess the toxicity of plasticizers. Sci. Rep.  2022; 12:19684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Giorgini  M., Taroncher  M., Ruiz  M.-J., Rodríguez-Carrasco  Y., Tolosa  J.  In vitro and predictive computational toxicology methods for the neurotoxic pesticide amitraz and its metabolites. Brain Sci.  2023; 13:252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Banerjee  P., Dehnbostel  F.O., Preissner  R.  Prediction is a balancing act: importance of sampling methods to balance sensitivity and specificity of predictive models based on imbalanced chemical data sets. Front Chem.  2018; 6:362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Rogers  D., Hahn  M.  Extended-connectivity fingerprints. J. Chem. Inf. Model.  2010; 50:742–754. [DOI] [PubMed] [Google Scholar]
  • 15. Joseph  L.D., Burton  A.L., Douglas  R.H., James  G.N.  Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci.  2002; 42:1273–1280. [DOI] [PubMed] [Google Scholar]
  • 16. Bewick  V., Cheek  L., Ball  J.  Statistics review 13: receiver operating characteristic curves. Crit. Care. 2004; 8:508–512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Pepe  M.S.  Receiver operating characteristic methodology. J Am Stat. Assoc.  2000; 95:308–311. [Google Scholar]
  • 18. Schmidt  U., Struck  S., Gruening  B., Hossbach  J., Jaeger  I.S., Parol  R., Lindequist  U., Teuscher  E., Preissner  R.  SuperToxic: a comprehensive database of toxic compounds. Nucleic Acids Res.  2009; 37:D295–D299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Lounkine  E., Keiser  M.J., Whitebread  S., Mikhailov  D., Hamon  J., Jenkins  J.L., Lavan  P., Weber  E., Doak  A.K., Côté  S.  et al.  Large-scale prediction and testing of drug activity on side-effect targets. Nature. 2012; 486:361–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Zhao  X., Sun  Y., Zhang  R., Chen  Z., Hua  Y., Zhang  P., Guo  H., Cui  X., Huang  X., Li  X.  Machine learning modeling and insights into the structural characteristics of drug-induced neurotoxicity. J. Chem. Inf. Model.  2022; 62:6035–6045. [DOI] [PubMed] [Google Scholar]
  • 21. Wu  H., Huang  J.  Drug-induced nephrotoxicity: pathogenic mechanisms, biomarkers and prevention strategies. Curr. Drug Metab.  2018; 19:559–567. [DOI] [PubMed] [Google Scholar]
  • 22. Wang  T., Sun  J., Zhao  Q.  Investigating cardiotoxicity related with hERG channel blockers using molecular fingerprints and graph attention mechanism. Comput. Biol. Med.  2023; 153:106464. [DOI] [PubMed] [Google Scholar]
  • 23. Choi  K.-E., Balupuri  A., Kang  N.S.  The study on the hERG blocker prediction using chemical fingerprint analysis. Molecules. 2020; 25:2615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Mendez  D., Gaulton  A., Bento  A.P., Chambers  J., De Veij  M., Félix  E., Magariños  M.P., Mosquera  J.F., Mutowo  P., Nowotka  M.  et al.  ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res.  2019; 47:D930–D940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Harris  C., Sander  C.R.  Late respiratory effects of cancer treatment. Curr. Opin. Support. Palliat. Care. 2017; 11:197–204. [DOI] [PubMed] [Google Scholar]
  • 26. Hou  T.J., Xu  X.J.  ADME evaluation in drug discovery. 3. Modeling blood-brain barrier partitioning using simple molecular descriptors. J. Chem. Inf. Comput. Sci.  2003; 43:2137–2152. [DOI] [PubMed] [Google Scholar]
  • 27. Kumar  R., Sharma  A., Alexiou  A., Bilgrami  A.L., Kamal  M.A., Ashraf  G.M.  DeePred-BBB: a blood brain barrier permeability prediction model with improved accuracy. Front. Neurosci.  2022; 16:858126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Mougin  C., Bouchez  A., Denaix  L., Garric  J., Martin-Laurent  F.  ECOTOX, new questions for terrestrial and aquatic ecotoxicology. Environ. Sci. Pollut. Res. Int.  2018; 25:33841–33843. [DOI] [PubMed] [Google Scholar]
  • 29. Takata  M., Lin  B.-L., Xue  M., Zushi  Y., Terada  A., Hosomi  M.  Predicting the acute ecotoxicity of chemical substances by machine learning using graph theory. Chemosphere. 2020; 238:124604. [DOI] [PubMed] [Google Scholar]
  • 30. Wu  Z., Ramsundar  B., Feinberg  E.N., Gomes  J., Geniesse  C., Pappu  A.S., Leswing  K., Pande  V.  MoleculeNet: a benchmark for molecular machine learning. Chem. Sci.  2018; 9:513–530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Shi  X.-X., Wang  F., Wang  Z.-Z., Huang  G.-Y., Li  M., Simal-Gandara  J., Hao  G.-F., Yang  G.-F.  Unveiling toxicity profile for food risk components: a manually curated toxicological databank of food-relevant chemicals. Crit. Rev. Food Sci. Nutr.  2022; 10.1080/10408398.2022.2152423. [DOI] [PubMed] [Google Scholar]
  • 32. Dorne  J.L.C.M., Richardson  J., Livaniou  A., Carnesecchi  E., Ceriani  L., Baldin  R., Kovarich  S., Pavan  M., Saouter  E., Biganzoli  F.  et al.  EFSA’s OpenFoodTox: an open source toxicological database on chemicals in food and feed and its future developments. Environ. Int.  2021; 146:106293. [DOI] [PubMed] [Google Scholar]
  • 33. Betts  K.S.  Tox21 to date: steps toward modernizing human hazard characterization. Environ. Health Perspect.  2013; 121:2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Gadaleta  D., Spînu  N., Roncaglioni  A., Cronin  M.T.D., Benfenati  E.  Prediction of the neurotoxic potential of chemicals based on modelling of molecular initiating events upstream of the adverse outcome pathways of (Developmental) neurotoxicity. Int. J. Mol. Sci.  2022; 23:3053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Preissner  S., Kroll  K., Dunkel  M., Senger  C., Goldsobel  G., Kuzman  D., Guenther  S., Winnenburg  R., Schroeder  M., Preissner  R.  SuperCYP: a comprehensive database on Cytochrome P450 enzymes including a tool for analysis of CYP-drug interactions. Nucleic Acids Res.  2010; 38:D237–D243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Siramshetty  V.B., Nickel  J., Omieczynski  C., Gohlke  B.O., Drwal  M.N., Preissner  R.  WITHDRAWN - A resource for withdrawn and discontinued drugs. Nucleic Acids Res.  2016; 44:D1080–D1086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Livingstone  D.J.  Computational techniques for the prediction of toxicity. Toxicol. In Vitro. 1994; 8:873–877. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkae303_Supplemental_File

Data Availability Statement

The ProTox 3.0 web-server is free and open to all users, and there is no login requirement and can be accessed via https://tox.charite.de.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES