Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jul 1.
Published in final edited form as: Xenobiotica. 2023 Aug 8;54(7):352–358. doi: 10.1080/00498254.2023.2245049

In Silico ADME/Tox Comes of Age: Twenty Years Later

Sean Ekins 1,*, Thomas R Lane 1, Fabio Urbina 1, Ana C Puhl 1
PMCID: PMC10850432  NIHMSID: NIHMS1923455  PMID: 37539466

Abstract

1. In the early 2000s pharmaceutical drug discovery was beginning to use computational approaches for absorption, distribution, metabolism, excretion and toxicity (ADME/Tox, also known as ADMET) prediction. This emphasis on prediction was an effort to reduce the risk of later stage failures from ADME/Tox.

2. Much has been written in the intervening twenty plus years and significant expenditure has occurred in companies developing these in silico capabilities which can be gleaned from publications. It is therefore an appropriate time to briefly reflect on what was proposed then and what the reality is today.

3. 20 years ago, we tended to optimize bioactivity and perhaps one ADME/Tox property at a time. Previously pharmaceutical companies needed a whole infrastructure for models – in silico and in vitro experts, IT, champions on a project team, educators and management support. Now we are in the age of generative de novo design where bioactivity and many ADME/Tox properties can be optimized and large language model technologies are available.

4. There are also some challenges such as the focus on very large molecules which may be outside of current ADME/Tox models.

5. We provide an opportunity to look forward with the increasing public data for ADME/Tox as well as expanded types of algorithms available.

Keywords: ADME/Tox, in silico, de novo

INTRODUCTION

If we follow the evolution of computational models since the 1980s a picture is presented that parallels drug discovery as the datasets and software improved (Table 1). After academia and industry had spent several years developing in vitro methods for absorption, distribution metabolism, excretion and toxicity (ADME/Tox), combinatorial chemistry and high throughput screening in the 1980–1990s offered a massive increase in the number of compounds available which then needed testing (Ekins and others 2000a) (Table 1). At the same time there was the pressure that had been building relating to the increasing costs of drug development and failure rates associated with ADME/Tox attrition (Kennedy 1997). Determining the ADME/Tox properties then presented a challenge which it was suggested could be met by using various types of computational models or filters (Ekins and others 2000b; Lipinski and others 1997). Nascent steps were also taken to use an array of computational models to predict drug-drug interactions (Ekins and Wrighton 2001) using various molecular descriptors and 3D pharmacophores (Ekins and others 2001).

Table 1.

Summary of in silico ADME/Tox models through the decades and into the future

1980s Data sets limited to 10s of compounds. Models were very basic QSAR models, statistical models, COMFA etc
1990s Datasets expanded to low hundreds of compounds, development of simple rules (Ro5), more sophisticated methods, trees, 3D pharmacophores, commercial tools rule.
2000s Bigger datasets due to high throughput ADME – 1000 / week creating datasets in the tens of thousands, expansion of machine learning algorithms used e.g. SVM, multiple optimization, growth in companies addressing ADME/Tox models.
2010s Datasets for major properties exceed 100,000–200,000, growth of open data, open algorithms. Deep learning starts to be used. Availability of opensource software for cheminformatics. Pharma donates some ADME data to public databases. Early models for drug induced liver injury.
2020s Datasets grow further still in large pharma. Public data in ChEMBL / PubChem supplemented by public efforts at NIH etc. Shift to larger molecules like PROTACS presents challenges for existing models, development of generative tools requiring ADME/Tox models as inputs. Large language models for ADME/Tox. Predict everything all at once.
Future Models expand the chemical property space further beyond PROTACS. Algorithms provide instant predictions of all properties, ADME/Tox, PK etc.

Some of the earliest proposals to simulate ADME properties made comparisons to aeronautical design and attempted to reduce the complex in vivo disposition down to discrete mechanisms and targets that could be modeled (Selick and others 2002) which was echoed by those in pharmaceutical companies (Dickins and Modi 2002). Many of these initial goals were likely met in just a few years and this pointed to the probable gain in efficiency that this could impact the way that drug discovery was conducted if integrated. This unfortunately did not happen quickly, though and it remains to be seen if we might be at that point now. Early on it was also suggested that the ability to predict ADME properties in silico would lead to fewer design-make-test cycles. There was also a discussion of the need for consideration of both experimental and in silico data in parallel for ideal optimization (Ekins and others 2002). At that time, it was felt no one in silico method would predict all the properties so there was a need to have several distinct approaches. There were already literature and commercially available in silico models for many properties, and these varied in scope tremendously with some examples of solubility or CYP mediated metabolism predictions (Butina and others 2002; van de Waterbeemd and Gifford 2003). The synergy between in vitro and in silico approaches was also a focus as the throughput of the various assays increased to the hundreds of compounds a week. At the time this was probably just about keeping up with the pace of medicinal chemistry in many of the pharmaceutical companies, creating bottlenecks elsewhere in decision making (Yu and Adedoyin 2003). This also led to efforts to try and merge ADME/Tox models with systems biology approaches to capture some of this biological complexity and aid in more accurate predictions (Ekins and others 2005). Roche scientists also subsequently highlighted their use of ADME/Tox models such as metabolic stability, phospholipidosis and focused on hERG models based on physiological parameters and structural fragments (Stahl and others 2006). Even though such models were promising, it was still noted that predicting rare toxicity events was outside the scope of such computational approaches in 2008 (Muster and others 2008).

By 2010 there was the emergence of precompetitive initiatives aimed at sharing such ADME/Tox data and we and others proposed that the time had come to create freely available ADME/Tox databases (Ekins and Williams 2010). These data could then be used to create predictive models using open source software and these would be able to be competitive or even surpass the functionality of commercial tools (Gupta and others 2010). By 2014 the public datasets for many ADME/Tox models were in the hundreds to thousands of compounds and even data from across many laboratories could be combined to build reliable classification models (Ekins 2014). The influence of these datasets as well as various websites and open software toolkits (Landrum 2020; Willighagen and others 2017) were starting to gain traction.

An in silico consortium made up of several pharmaceutical companies, including Lilly, Vertex, Abbvie, Roche, Pfizer, AstraZeneca, and Genentech, described how most used their ADME models trained on datasets that were on the order of 100s – 1000s of compounds (Lombardo and others 2017). The reality was that for some of these companies their internal datasets exceeded 100,000 compounds; for example by 2010 Pfizer had 200,000 compounds with metabolic stability data (Gupta and others 2010). The models and their impact were as follows: AstraZeneca – their solubility model showed a 7-fold increase in compounds with good solubility, Genentech – their microsomal stability model resulted in stable compounds increasing 2-fold, Lilly – their CYP3A4 time dependent inhibition model improved the situation by 3-fold. In addition, each company described additional models such as, Genentech – human protein binding, MDCK-MDR1, metabolic stability, Abbvie – MDCK, Lilly – PGP efflux, MDCK permeability (Lombardo and others 2017). A more recent overview of nearly two decades of in silico ADME/Tox at Bayer provided an overview of the algorithms, descriptors used and dataset sizes (Table 1) that enabled their development of an in house informatics platform combining data warehouse, data visualization tools, compute tools as well as interfacing to document and project management (Goller and others 2020). They stated that the key to successful application of these tools depended on the model quality, relevance as well as ease of access and interpretability of the results produced (Goller and others 2020).

By 2020, there was a widespread emphasis on achieving reproducibility in computational models utilized for drug discovery, which also extended to ADME/Tox models (Schaduangrat and others 2020). Partnerships between the public and private sectors had even sprung up in Japan (iD3-INST) to address the field of modeling pharmacokinetics and cardiac toxicity, with the selected models having promising predictive performance (Komura and others 2021). The efforts of the National Institutes of Health (NIH) to increase the datasets for ADME properties like PAMPA, rat liver microsome stability and solubility has led to over 20,000 compounds tested against each in vitro and the development of random forest and graph based neural networks. However, it only appears that a subset of this data has been deposited in PubChem to date (Siramshetty and others 2021). The importance of machine learning in early drug discovery and in particular for predicting ADME and pharmacokinetic parameters was emphasized in another recent review from scientists at Sanofi, which apart from the mention of a few newer algorithms (generative pre-training), could have been written at any time in the preceding 20 years (Pillai and others 2022). The growing need to make toxicology models available has led academia and industry groups to describe how findable, accessible, interoperable, and reusable (FAIR) principles could be applied to impact regulatory acceptance of such models (Cronin and others 2023). Why stop at applying these principles to toxicology models alone and just apply them to any computational models that are published in general? Certainly, the shift to model toxicity in vivo and initiatives such as CATMoS (Mansouri and others 2021) are being evaluated by regulatory agencies for their utility and applicability as a potential replacement for in vivo rat acute oral toxicity studies (Minerali and others 2020). Similarly, these toxicity prediction efforts could be applied to other species and assist in the reduction in animals in these studies (Lane and others 2023).

Current Challenges and Opportunities

The shift to companies developing larger molecules (beyond the rule of 5) such as PROTACS comes with challenges in being able to reliably predict their ADME/Tox properties. Machine learning prediction of these molecules has been limited due to the few PROTACS with in vitro ADME/Tox data populating public databases, although companies may have much larger datasets of their own focused on their chemistry. While there may be considerably more ADME/Tox data for other classes of large molecules such as macrocycles, peptides, natural products which have been around for more than the 20 years since the development of PROTACS. Public databases obviously are therefore quite limited in their utility for building such models. A recent analysis of different pharmaceutical companies highlighted some of these issues (Volak and others 2023). While there have been limited efforts in accessing permeability of PROTACS computationally (Poongavanam and others 2023), most efforts likely use proprietary data which is not accessible. Ideally, data for PROTACS will be made at least partially publicly available, with datasets describing their solubility, metabolic stability and HERG being starting points of most interest.

In 2002 we highlighted some of the ADME/Tox models we thought would be key to the ADMET field (Ekins and others 2002), with many of these now having a substantial amount of publicly available data in 2023 (Table 2). Our analysis in Table 2 also showcases the size of publicly-available datasets per target, indicating the progress made in the past two decades. This represents a valuable starting point for machine learning model development outside of big pharmaceutical companies which now have access to in excess of 10,000’s in the case of Bayer as of 2020 (Goller and others 2020) through to 100,000’s per property (in the case of Pfizer in 2010 (Gupta and others 2010)). This is however but a very small sample of the wide range of properties that have been modeled, a more exhaustive listing of ADME/Tox models would include drug induced liver injury, nephrotoxicity, ecotoxicity and beyond, while several recent publications from big pharmaceutical companies indicates their current focus (Table 3).

Table 2.

Primary ADME/Tox models for drug discovery, public sources of datasets compared with a big pharma example in 2020 (Bayer) (Goller and others 2020).

Primary Endpoint Approximate Training Set Size Public sources Data set size in industry (Goller and others 2020)
Solubility (pH 7.4) thermodynamic Solubility 72,000 10,000–30,000
Absorption (Caco-2) Permeability 1,400 >10,000
Apparent permeability 2,200
efflux ratio 1,100 >10,000
Mutagenicity (AMES) induces revertant colony growth 6,500 >10,000
Bioavailability (Oral, human) % 1,900 -
Metabolic stability (multiple species) %, Clearance, t1/2 200 – 1,000 >10,000
BBB penetration Binary 4,200 -
Cardiac toxicity (hERG) IC50 (CHO/Hek293)/Ki 1,200–1,800 >10,000
Plasma protein binding (human) % 1,000 >30,000

Table 3.

Examples of recent ADME/Tox models from big pharmaceutical companies.

Company ADME/Tox models
Pfizer HERG (Tysinger and others 2023), DILI (Martin and others 2022), Fraction unbound (Winiwarter and others 2019)
J&J Tox21 datasets (Zhang and others 2021), DILI (Rao and others 2023), human metabolic stability (Van Rompaey and others 2023), polypharmacology (Liu and others 2021), kinome profiling (Janssen and others 2019)
Novartis Local vs global ADME models (Di Lascio and others 2023), LogP (Isert and others 2023), BBB (Hamzic and others 2022), ADR profiling (Ietswaart and others 2020)
Abbvie Off target safety assessment (Rao and others 2019), P-gp substrates (Esposito and others 2020)
Roche mouse and rat PK (Stoyanova and others 2023), Genotoxicity (Zeller and others 2020)
AstraZeneca Drug interactions (Gill and others 2023), PK (Obrezanova and others 2022)
Sanofi Phototoxicity (Schmidt and others 2019)
GSK Volume of distribution (Murad and others 2021)

Outside of drug discovery, recent advances in machine learning in image generation (Ramesh and others 2022) and the advent of Transformer-based large language models (LLMs) (Vaswani and others 2017) have shifted the landscape of machine learning entirely, with accelerated research into the scaling of both models and data as well as structure of new models (Brown and others 2020; Bubeck and others 2023; Lewis and others 2019). This has led to exponential breakthroughs in model capabilities including question-and-answering, text-to-image translation, language conversion, and even emergent properties not explicitly trained for (Chowdhery and others 2022; Srivastava and others 2022). This research has lent itself to tasks in drug discovery as each molecule can be represented as a language, namely through Simplified Molecular Input Entry System (SMILES) (Weininger 1988). This simplistic SMILES notation, while not capturing all information relevant to molecule features, has proven itself to be an effective enough representation that LLMs have shown state-of-the-art results in some molecule property and ADME predictions (He and others 2022; Irwin and others 2022; Lu and Zhang 2022). This new approach to ADME modeling is in its infancy, and to date only limited and small datasets (relative to dataset sizes in normal large language tasks) have been used to train such models. The work of Hoffmann et al., revealed Transformer-based LLMs data-and-token scaling laws, showing that dataset size and model parameters scale roughly linearly (for a fixed compute, ~20 text tokens per parameter are optimal for training) (Hoffmann and others 2022). While the analogous scaling law for SMILES encoded molecules has not yet been investigated. It can be assumed given the current number of parameters in state-of-the-art models (> 1 trillion) that ADME/Tox data sparsity and not model complexity will be the limiting factor in utilizing the predictive power of LLMs for ADME/Tox predictions.

Are we any better off with all these massive datasets in big pharmaceutical companies? Individually each company may be, but collectively they may be able to do much more if their resources were combined. The comparatively limited number of mergers of such large companies over the last decade has prevented testing this hypothesis, namely would combining their ADME/Tox data be synergistic or merely additive? For example, Bayer and Schering merged which greatly increased the diversity of the molecules and required harmonization of the data (Goller and others 2020). The combination of such data likely creates significant challenges for integration. One way to address this issue is through the option of federated learning (Figure 1), in which a single machine learning model can be trained using private, distinct datasets, without the datasets and therefore the molecule structures ever being shared at all (Konečný and others 2015; Mittone and others 2023). The method works by sharing encrypted model weights with each private dataset host, making local parameter updates with local data, and finally sending back and aggregating the new updates into a single model. Federated Learning is therefore a path forward to combine datasets while simultaneously keeping them private, and thus may be a way to harness the power of massive datasets locked away in pharmaceutical companies. Such a proof-of-concept has already been created (e.g. MELLODDY) (Oldenhof and others 2022), which suggests a similar effort in the ADME/Tox space could be used effectively and worthy of evaluating. In the public domain we have nowhere near the amount of data or structural diversity, but we can build models that are predictive within a representative applicability domain. In conjunction with generating new data for a limited specific set of ADME/Tox properties during a project, we have the potential to perhaps also enhance the efficiency and significantly reduce some of the costs in drug discovery. This is achieved by eliminating the need to test these properties individually for every compound. In our own case, as a small company with limited resources we use contract research organizations to test solubility, metabolic stability, Caco-2, CYP inhibition and HERG using standard methods to provide data for our drug discovery projects and iteratively enhance our own ADME/Tox models. These also present opportunities to test our own machine learning models as we generate this data over the years for compounds of interest.

Figure 1.

Figure 1.

Schematic of federated learning

In a little over 20 years, we can say with some certainty that in silico ADME/Tox models have come a long way from their relatively humble beginnings and small training sets. While there is still scope for major improvements, the recent developments of newer algorithms, software for de novo design of new molecules, and a shift to larger molecules like PROTACS creates opportunities to update the models’ capabilities. As a result, the amount of data generated will be significantly influenced. It is possible, we might have already reached the peak of in vitro ADME/Tox data generation and experienced a turning point where it becomes more focused. Commercial software tools will now have to prioritize the need for generative molecule design integrated with these ADME/Tox models, and ultimately these predicted properties might become imperceptible to drug designers as these software design synthesizable molecules with ideal bioactivity and ADME/Tox properties. By using large language models, we may also see the barriers to using such software lowered further and lead to improvements in accessibility. With such models, users would just ask the software to predict properties of their molecules of interest or even go as far as to ask the software to design molecules that remove the metabolic liabilities or improve solubility, for example. We are not there yet as general large language models like those currently available are not trained explicitly on ADME/Tox or physicochemical property data, but it is within reach. This presents opportunities for how we could integrate our 20 plus years of in silico ADME/Tox models in the future. We are now on the verge of accomplishing or potentially surpassing what was envisioned by us and others over two decades ago.

ACKNOWLEDGMENTS

SE kindly acknowledges the many collaborators on in silico ADME/Tox with whom he has been fortunate to work with over the past 20+ years.

Grant information

We kindly acknowledge NIH funding: 2R44GM122196-04A1 from NIGMS and 2R44ES031038-02A1 from NIEHS.

ABBREVIATIONS USED

ADME/Tox

absorption, distribution metabolism, excretion and toxicity

Footnotes

Conflicts of interest

S.E. is owner, all others are employees of Collaborations Pharmaceuticals, Inc.

REFERENCES

  1. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A and others, (2020). Language Models are Few-Shot Learners. Available at: https://arxiv.org/abs/2005.14165. [Google Scholar]
  2. Bubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, Lee P, Lee YT, Li Y, Lundberg SM and others. (2023). Sparks of Artificial General Intelligence: Early experiments with GPT-4. https://arxiv.org/abs/2303.12712. [Google Scholar]
  3. Butina D, Segall MD, Frankcombe K. (2002). Predicting ADME properties in silico: methods and models. Drug Discov Today, 7, S83–8. [DOI] [PubMed] [Google Scholar]
  4. Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, Barham P, Chung HW, Sutton C, Gehrmann S and others. 2022. PaLM: Scaling Language Modeling with Pathways. p arXiv:2204.02311. [Google Scholar]
  5. Cronin MTD, Belfield SJ, Briggs KA, Enoch SJ, Firman JW, Frericks M, Garrard C, Maccallum PH, Madden JC, Pastor M and others. (2023). Making in silico predictive models for toxicology FAIR. Regul Toxicol Pharmacol, 140, 105385. [DOI] [PubMed] [Google Scholar]
  6. Di Lascio E, Gerebtzoff G, Rodriguez-Perez R. (2023). Systematic Evaluation of Local and Global Machine Learning Models for the Prediction of ADME Properties. Mol Pharm, 20, 1758–67. [DOI] [PubMed] [Google Scholar]
  7. Dickins M, Modi S. (2002). The importance of predictive ADME simulation. Drug Discov Today, 7, 755–6. [DOI] [PubMed] [Google Scholar]
  8. Ekins S (2014). Progress in computational toxicology. J Pharmacol Toxicol Methods, 69, 115–40. [DOI] [PubMed] [Google Scholar]
  9. Ekins S, Boulanger B, Swaan PW, Hupcey MA. (2002). Towards a new age of virtual ADME/TOX and multidimensional drug discovery. J Comput Aided Mol Des, 16, 381–401. [DOI] [PubMed] [Google Scholar]
  10. Ekins S, de Groot MJ, Jones JP. (2001). Pharmacophore and three-dimensional quantitative structure activity relationship methods for modeling cytochrome p450 active sites. Drug Metab Dispos, 29, 936–44. [PubMed] [Google Scholar]
  11. Ekins S, Nikolsky Y, Nikolskaya T. (2005). Techniques: application of systems biology to absorption, distribution, metabolism, excretion and toxicity. Trends Pharmacol Sci, 26, 202–9. [DOI] [PubMed] [Google Scholar]
  12. Ekins S, Ring BJ, Grace J, McRobie-Belle DJ, Wrighton SA. (2000a). Present and future in vitro approaches for drug metabolism. J Pharmacol Toxicol Methods, 44, 313–24. [DOI] [PubMed] [Google Scholar]
  13. Ekins S, Waller CL, Swaan PW, Cruciani G, Wrighton SA, Wikel JH. (2000b). Progress in predicting human ADME parameters in silico. J Pharmacol Toxicol Methods, 44, 251–72. [DOI] [PubMed] [Google Scholar]
  14. Ekins S, Williams AJ. (2010). Precompetitive preclinical ADME/Tox data: set it free on the web to facilitate computational model building and assist drug development. Lab Chip, 10, 13–22. [DOI] [PubMed] [Google Scholar]
  15. Ekins S, Wrighton SA. (2001). Application of in silico approaches to predicting drug--drug interactions. J Pharmacol Toxicol Methods, 45, 65–9. [DOI] [PubMed] [Google Scholar]
  16. Esposito C, Wang S, Lange UEW, Oellien F, Riniker S. (2020). Combining Machine Learning and Molecular Dynamics to Predict P-Glycoprotein Substrates. J Chem Inf Model, 60, 4730–49. [DOI] [PubMed] [Google Scholar]
  17. Gill J, Moullet M, Martinsson A, Miljkovic F, Williamson B, Arends RH, Pilla Reddy V. (2023). Evaluating the performance of machine-learning regression models for pharmacokinetic drug-drug interactions. CPT Pharmacometrics Syst Pharmacol, 12, 122–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Goller AH, Kuhnke L, Montanari F, Bonin A, Schneckener S, Ter Laak A, Wichard J, Lobell M, Hillisch A. (2020). Bayer’s in silico ADMET platform: a journey of machine learning over the past two decades. Drug Discov Today, 25, 1702–9. [DOI] [PubMed] [Google Scholar]
  19. Gupta RR, Gifford EM, Liston T, Waller CL, Hohman M, Bunin BA, Ekins S. (2010). Using open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity properties. Drug Metab Dispos, 38, 2083–90. [DOI] [PubMed] [Google Scholar]
  20. Hamzic S, Lewis R, Desrayaud S, Soylu C, Fortunato M, Gerebtzoff G, Rodriguez-Perez R. (2022). Predicting In Vivo Compound Brain Penetration Using Multi-task Graph Neural Networks. J Chem Inf Model, 62, 3180–90. [DOI] [PubMed] [Google Scholar]
  21. He J, Nittinger E, Tyrchan C, Czechtizky W, Patronov A, Bjerrum EJ, Engkvist O. (2022). Transformer-based molecular optimization beyond matched molecular pairs. Journal of Cheminformatics, 14, 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Hoffmann J, Borgeaud S, Mensch A, Buchatskaya E, Cai T, Rutherford E, de Las Casas D, Hendricks LA, Welbl J, Clark A and others. 2022. Training Compute-Optimal Large Language Models. p arXiv:2203.15556. [Google Scholar]
  23. Ietswaart R, Arat S, Chen AX, Farahmand S, Kim B, DuMouchel W, Armstrong D, Fekete A, Sutherland JJ, Urban L. (2020). Machine learning guided association of adverse drug reactions with in vitro target-based pharmacology. EBioMedicine, 57, 102837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Irwin R, Dimitriadis S, He J, Bjerrum EJ. (2022). Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology, 3, 015022. [Google Scholar]
  25. Isert C, Kromann JC, Stiefl N, Schneider G, Lewis RA. (2023). Machine Learning for Fast, Quantum Mechanics-Based Approximation of Drug Lipophilicity. ACS Omega, 8, 2046–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Janssen APA, Grimm SH, Wijdeven RHM, Lenselink EB, Neefjes J, van Boeckel CAA, van Westen GJP, van der Stelt M. (2019). Drug Discovery Maps, a Machine Learning Model That Visualizes and Predicts Kinome-Inhibitor Interaction Landscapes. J Chem Inf Model, 59, 1221–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Kennedy T (1997). Managing the drug discovery/development interface. Drug Discovery Today, 2, 436–44. [Google Scholar]
  28. Komura H, Watanabe R, Kawashima H, Ohashi R, Kuroda M, Sato T, Honma T, Mizuguchi K. (2021). A public-private partnership to enrich the development of in silico predictive models for pharmacokinetic and cardiotoxic properties. Drug Discov Today, 26, 1275–83. [DOI] [PubMed] [Google Scholar]
  29. Konečný J, McMahan B, Ramage D. 2015. Federated Optimization:Distributed Optimization Beyond the Datacenter. p arXiv:1511.03575. [Google Scholar]
  30. Landrum G, (2020). RDkit. Available at: https://www.rdkit.org.
  31. Lane TR, Harris J, Urbina F, Ekins S. (2023). Comparing LD50/LC50 Machine Learning Models for Multiple Species. ACS Chemical Health & Safety, 30, 83–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Lewis M, Liu Y, Goyal N, Gazvininejad M, Levy O, Stoyanov V, Zettlemoyer L. (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. https://arxiv.org/abs/1910.13461. [Google Scholar]
  33. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. (1997). Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews, 23, 3–25. [DOI] [PubMed] [Google Scholar]
  34. Liu X, Ye K, van Vlijmen HWT, Emmerich MTM, AP IJ, van Westen GJP. (2021). DrugEx v2: de novo design of drug molecules by Pareto-based multi-objective reinforcement learning in polypharmacology. J Cheminform, 13, 85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Lombardo F, Desai PV, Arimoto R, Desino KE, Fischer H, Keefer CE, Petersson C, Winiwarter S, Broccatelli F. (2017). In Silico Absorption, Distribution, Metabolism, Excretion, and Pharmacokinetics (ADME-PK): Utility and Best Practices. An Industry Perspective from the International Consortium for Innovation through Quality in Pharmaceutical Development. J Med Chem, 60, 9097–113. [DOI] [PubMed] [Google Scholar]
  36. Lu J, Zhang Y. (2022). Unified Deep Learning Model for Multitask Reaction Predictions with Explanation. Journal of Chemical Information and Modeling, 62, 1376–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Mansouri K, Karmaus A, Fitzpatrick J, Patlewicz G, Pradeep P, Alberga D, Alepee N, Allen TEH, Allen D, Alves VM and others. (2021). Erratum: CATMoS: Collaborative Acute Toxicity Modeling Suite. Environ Health Perspect, 129, 79001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Martin MT, Koza-Taylor P, Di L, Watt ED, Keefer C, Smaltz D, Cook J, Jackson JP. (2022). Early Drug-Induced Liver Injury Risk Screening: “Free,” as Good as It Gets. Toxicol Sci, 188, 208–18. [DOI] [PubMed] [Google Scholar]
  39. Minerali E, Foil DH, Zorn KM, Ekins S. (2020). Evaluation of Assay Central® Machine Learning Models for Rat Acute Oral Toxicity Prediction. ACS Sustain Chem Eng, 8, 16020–7. [Google Scholar]
  40. Mittone G, Svoboda F, Aldinucci M, Lane ND, Lio P. 2023. A Federated Learning Benchmark for Drug-Target Interaction. p arXiv:2302.07684. [Google Scholar]
  41. Murad N, Pasikanti KK, Madej BD, Minnich A, McComas JM, Crouch S, Polli JW, Weber AD. (2021). Predicting Volume of Distribution in Humans: Performance of In Silico Methods for a Large Set of Structurally Diverse Clinical Compounds. Drug Metab Dispos, 49, 169–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Muster W, Breidenbach A, Fischer H, Kirchner S, Muller L, Pahler A. (2008). Computational toxicology in drug development. Drug Discov Today, 13, 303–10. [DOI] [PubMed] [Google Scholar]
  43. Obrezanova O, Martinsson A, Whitehead T, Mahmoud S, Bender A, Miljkovic F, Grabowski P, Irwin B, Oprisiu I, Conduit G and others. (2022). Prediction of In Vivo Pharmacokinetic Parameters and Time-Exposure Curves in Rats Using Machine Learning from the Chemical Structure. Mol Pharm, 19, 1488–504. [DOI] [PubMed] [Google Scholar]
  44. Oldenhof M, Ács G, Pejó B, Schuffenhauer A, Holway N, Sturm N, Dieckmann A, Fortmeier O, Boniface E, Mayer C and others. 2022. Industry-Scale Orchestrated Federated Learning for Drug Discovery. p arXiv:2210.08871. [Google Scholar]
  45. Pillai N, Dasgupta A, Sudsakorn S, Fretland J, Mavroudis PD. (2022). Machine Learning guided early drug discovery of small molecules. Drug Discov Today, 27, 2209–15. [DOI] [PubMed] [Google Scholar]
  46. Poongavanam V, Kolling F, Giese A, Goller AH, Lehmann L, Meibom D, Kihlberg J. (2023). Predictive Modeling of PROTAC Cell Permeability with Machine Learning. ACS Omega, 8, 5901–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Ramesh A, Dhariwal P, Nichol A, Chu C, Chen M, (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents. Available at: https://arxiv.org/abs/2204.06125. [Google Scholar]
  48. Rao M, Nassiri V, Alhambra C, Snoeys J, Van Goethem F, Irrechukwu O, Aleo MD, Geys H, Mitra K, Will Y. (2023). AI/ML Models to Predict the Severity of Drug-Induced Liver Injury for Small Molecules. Chem Res Toxicol, 36, 1129–39. [DOI] [PubMed] [Google Scholar]
  49. Rao MS, Gupta R, Liguori MJ, Hu M, Huang X, Mantena SR, Mittelstadt SW, Blomme EAG, Van Vleet TR. (2019). Novel Computational Approach to Predict Off-Target Interactions for Small Molecules. Front Big Data, 2, 25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Schaduangrat N, Lampa S, Simeon S, Gleeson MP, Spjuth O, Nantasenamat C. (2020). Towards reproducible computational drug discovery. J Cheminform, 12, 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Schmidt F, Wenzel J, Halland N, Gussregen S, Delafoy L, Czich A. (2019). Computational Investigation of Drug Phototoxicity: Photosafety Assessment, Photo-Toxophore Identification, and Machine Learning. Chem Res Toxicol, 32, 2338–52. [DOI] [PubMed] [Google Scholar]
  52. Selick HE, Beresford AP, Tarbit MH. (2002). The emerging importance of predictive ADME simulation in drug discovery. Drug Discov Today, 7, 109–16. [DOI] [PubMed] [Google Scholar]
  53. Siramshetty V, Williams J, Nguyen Eth T, Neyra J, Southall N, Mathe E, Xu X, Shah P. (2021). Validating ADME QSAR Models Using Marketed Drugs. SLAS Discov, 26, 1326–36. [DOI] [PubMed] [Google Scholar]
  54. Srivastava A, Rastogi A, Rao A, Shoeb AAM, Abid A, Fisch A, Brown AR, Santoro A, Gupta A, Garriga-Alonso A and others. 2022. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models. p arXiv:2206.04615. [Google Scholar]
  55. Stahl M, Guba W, Kansy M. (2006). Integrating molecular design resources within modern drug discovery research: the Roche experience. Drug Discov Today, 11, 326–33. [DOI] [PubMed] [Google Scholar]
  56. Stoyanova R, Katzberger PM, Komissarov L, Khadhraoui A, Sach-Peltason L, Groebke Zbinden K, Schindler T, Manevski N. (2023). Computational Predictions of Nonclinical Pharmacokinetics at the Drug Design Stage. J Chem Inf Model, 63, 442–58. [DOI] [PubMed] [Google Scholar]
  57. Tysinger EP, Rai BK, Sinitskiy AV. (2023). Can We Quickly Learn to “Translate” Bioactive Molecules with Transformer Models? J Chem Inf Model, 63, 1734–44. [DOI] [PubMed] [Google Scholar]
  58. van de Waterbeemd H, Gifford E. (2003). ADMET in silico modelling: towards prediction paradise? Nat Rev Drug Discov, 2, 192–204. [DOI] [PubMed] [Google Scholar]
  59. Van Rompaey D, Morrison D, Van Den Bergh A, Wegner JK. (2023). A Symbolic Regression Model for the Prediction of Drug Binding to Human Liver Microsomes. Mol Pharm, 20, 2436–42. [DOI] [PubMed] [Google Scholar]
  60. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, HGomez AN, Kaiser L, Plusukhin I. (2017). Attention Is All You Need. ArXiv, 1706.03762. [Google Scholar]
  61. Volak LP, Duevel HM, Humphreys S, Nettleton D, Phipps C, Pike A, Rynn C, Scott-Stevens P, Zhang D, Zientek M. (2023). Industry Perspective on the Pharmacokinetic and ADME Characterization of Heterobifunctional Protein Degraders. Drug Metab Dispos. [DOI] [PubMed] [Google Scholar]
  62. Weininger D (1988). SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28, 31–6. [Google Scholar]
  63. Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluskal T, Rojas-Cherto M, Spjuth O and others. (2017). The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform, 9, 33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Winiwarter S, Chang G, Desai P, Menzel K, Faller B, Arimoto R, Keefer C, Broccatelli F. (2019). Correction to “Prediction of Fraction Unbound in Microsomal and Hepatocyte Incubations: A Comparison of Methods across Industry Datasets”. Mol Pharm, 16, 4755. [DOI] [PubMed] [Google Scholar]
  65. Yu H, Adedoyin A. (2003). ADME-Tox in drug discovery: integration of experimental and computational technologies. Drug Discov Today, 8, 852–61. [DOI] [PubMed] [Google Scholar]
  66. Zeller A, Brigo A, Brink A, Guerard M, Lang D, Muster W, Runge F, Sutter A, Vock E, Wichard J and others. (2020). Genotoxicity Assessment of Drug Metabolites in the Context of MIST and Beyond. Chem Res Toxicol, 33, 10–9. [DOI] [PubMed] [Google Scholar]
  67. Zhang J, Norinder U, Svensson F. (2021). Deep Learning-Based Conformal Prediction of Toxicity. J Chem Inf Model, 61, 2648–57. [DOI] [PubMed] [Google Scholar]

RESOURCES