AI in drug design: evolution or revolution?

Darren V S Green

doi:10.1042/ETLS20240005

. 2025 Jun 17:ETLS20240005. doi: 10.1042/ETLS20240005

AI in drug design: evolution or revolution?

Darren V S Green ^1,^✉

PMCID: PMC12261213

Introduction

The pharmaceutical industry is familiar with the ‘hype cycle of technologies, artificial intelligence (AI) being the most recent. AI is best thought of as a nested set of capabilities: machine learning (ML; models that learn from legacy data), deep learning (ML models that mimic human brain processes), generative AI (use of ML to create original content) and the ultimate goal of artificial general intelligence (systems capable of conducting scientific research and discovering new knowledge).

Great claims are made for AI in drug discovery – a revolution is coming according to McKinsey [1]. There has been large amounts of backing for AI-based startups, with an estimated $4 billion invested between 2018 and 2022 in the leading 20 companies [2] and the size of the AI services market in drug discovery expected to reach almost $8 billion per annum by 2030 [3]. Recently, Xaira Therapeutics spun out of the University of Washington Baker lab with $1 billion in funding [4]. Given the lack of impact on pharma productivity from previous technology ‘game-changers’ [5], how much is based on real evidence and how much is wishful thinking? Exactly where and how will AI disrupt established practices in drug discovery? This perspective aims to shed some light on these questions and will hopefully convince you that there is already enough evidence that, this time, the journey along the technology hype curve will be different.

AI methods

Without descending into the nuances of deep learning network architectures and such like, it will be useful to introduce common ML terminology and utility. The comprehensive review by Yang et al. [6] is recommended for further reading.

‘Classical machine learning’ is a term generally applied to the collection of methods which pre-date ‘deep learning’. Supervised learning methods (i.e. those which are trained to predict a specific labelled end-point such as logP) include support vector machines, naïve Bayes and random forests. Unsupervised learning methods (i.e. where the data are unlabelled) include clustering, k-nearest neighbours, principal component analysis and self-organising maps. These methods are fed descriptors (e.g. chemical structure fingerprints [7]) and produce a mathematical model that relates the descriptors to the desired endpoint (supervised) or allows a data-driven representation of the molecules in the descriptor space (unsupervised).

Deep learning methods have been key to the emergence of modern AI. Deep learning typically refers to a learning system incorporating multiple layers of artificial neural networks. Such networks are very flexible learners and are able to model many types of data (e.g. medical images, face recognition, speech, music and, of course, molecular data) and highly complex, non-linear relationships. They are particularly powerful when given very large data sets, for example the 1.2 million images used by the breakthrough AlexNet image classification system [8].

A key departure from classical ML is the ability of deep learning models to learn the most effective representation of the data, rather than use fixed, human-engineered descriptors. Molecules can be represented as graphs or as SMILES strings [9] or proteins as sequences of their shorthand amino-acid letters, with their actual representation in the model refined by the model training process.

The flexibility of deep learning networks has enabled a large number of variants and types of learning:

Multitask learning allows learning of several, related endpoints (e.g. IC50 data from kinase panels) in parallel, making use of a shared representation which can be very useful where some endpoints have large data and others small data.
Transfer learning enables fine-tuning of a model which has been pre-trained on a large corpus of data, by further training on a much smaller but focussed data set.
Reinforcement learning is a reward-driven learning strategy that enables the optimisation of a model to predict an outcome without a priori knowing how to optimise; it is often used in combination with other computational models which may impose penalties (e.g. developability models) or rewards (e.g. fit to a pharmacophore model).
Contrastive learning is a semi-supervised learning method which attempts to learn a latent space where similar data are close and dissimilar data are far apart. It is particularly useful at integrating disparate data types such as image and chemical structure data [10].
Diffusion models [11] are a class of supervised learning methods gaining in popularity. Noise is incrementally added to the training data (e.g. a set of images) until the new data set is a Gaussian distribution. A deep learning model is then trained to be able to follow the reverse process (i.e. start with noise and recreate the input image). This creates a model that is a very efficient generative tool for images and, increasingly, molecular design [12,13]
Recurrent neural networks models (RNN) have proved a powerful tool for generative chemistry [14]. Designed for modelling time-series and sequence data and particularly useful for language, translation and speech models, RNNs (particularly the long short-term memory variant [15]) use the context of a word or character to modify the prediction for what the next word or character will be.
Large language models (LLMs) [16], a hot topic due to the extraordinary impact of OpenAI’s Chat GPT and similar models. These are extremely large models, trained by extremely large data sets. In contrast with RNNs, LLMs use a transformer architecture that enables self-learning plus parallelisation of training. LLMs are able to understand entire conversations and context. Interestingly, the feature of LLMs that often irritates everyday use (hallucination) is the one that is most useful for molecule design – producing plausible SMILES or protein sequences that have never been reported.

One other learning technique should be mentioned. Active learning is an optimisation method that uses model uncertainty to guide the next data acquisition, either from an existing data set or from the next experiment in a design–make–test cycle. Generally, active learning approaches will seek to suggest data that will improve the model (‘Explore’), until the model has reached a point where it can confidently predict (‘Exploit’).

Application of AI in small molecule discovery

ML in chemistry is not new. In fact, chemistry has its own name for statistical models: quantitative structure activity relationship (QSAR) models. Initially, these were linear regression models, the first being published in the 19th century(!) by Overton [17] and Meyer [18]. These ideas were famously developed by Hansch & Fujita [19]. QSAR has continued to evolve as new methods were invented [20], the chemistry community popularising the multivariate technique of partial least squares [21]. QSAR modellers were early adopters of neural networks [22], kernel ML methods [23], random forests [24], active learning [25], automated design [26], AI-based design processes [27], Pareto-based multi objective designs [28,29] and automated QSAR modelling/MLOps [30,31]. QSAR models have been used in the design of marketed drugs [32] and are established tools in a regulatory setting for risk assessments of organic compounds [33].

If ML is not new to drug design, why then the current, excited, interest and what has enabled it? The growth in computing power (an iPhone 12 is 5000 x faster than the Cray-2, the world’s fastest supercomputer from 1985!), and almost commodity pricing of very large memory and storage has enabled computational scientists to employ methods that were hitherto either impractical or infeasible. On a practical level, great computational power has also accelerated the speed with which researchers develop new solutions, reducing the iteration time for each cycle of testing. Here is a non-exhaustive list of the most interesting developments (note: not all are AI applications):

Large-scale, big data cheminformatics, such as matched molecular pairs [34,35] and series [36]
Large-scale ML hyperparameter optimisation to yield optimal models [37]
Large enumerations of readily accessible chemistry space [38]
Free energy perturbation (FEP, invented in 1987 and now usable!) [39]
Deep learning QSAR models for large datasets [40]
Deep learning-based generative chemistry [41]
ML-based forcefields with DFT levels of accuracy [42]
Accurate protein structure prediction with Alphafold & RosettaFold [43,44]
Multi-task [45] and multi-modal [46] modelling of complex data sets
Small [47] or large language models trained on chemistry [48] and protein sequence [49]
Transfer learning from pre-trained/foundational models for small data sets [50,51]
Federated modelling for safe data sharing between companies [52,53]

This is an impressive list of capabilities, but do they work in the real world? In short, it appears so. In their review of generative chemistry, Du et al. [41] cite no fewer than 37 published examples of laboratory validated small molecule design using generative chemistry methods.

The first published example of generative chemistry design is that of Insilico Medicine’s DDR-1 inhibitor [54], designed, synthesised and tested in 21 days. This was a controversial example, being extremely close to a known marketed drug Ponatinib (Figure 1a) and subject to a ‘well any chemist would have done that’ response. A more charitable view needs to be taken – these new design paradigms must be able to do the ordinary as well as – hopefully – the extraordinary. A more novel DDR-1 inhibitor was discovered by Yoshimori et al. [55] (Figure 1a) by coupling a generative chemistry model with a traditional pharmacophore approach. More ambitious was the coupling of an automated design system with an automated on-chip chemical synthesis platform to generate novel LXRa agonists (Figure 1b) [56]. More recently, a collaboration between Pfizer and PostEra reported the ML-driven discovery of a series of potent, selective and orally available SARS-CoV-2 PLpro inhibitors, with the lead compound (active in a mouse model) identified in less than eight months [57] (Figure 1c).

Figure 1: — (a) DDR1 inhibitors: the marketed drug Ponatinib and those designed from reference [54] and reference [55].(b) An LXR ligand design in reference [56]. (c) The starting (GRL0617) point and optimised (PF-07957472) compound designed in reference [57].

There are other validated computational protocols for automated design that use more traditional computational chemistry and cheminformatics. The first published example of modern automated design was provided by Besnard et al. [58], whereby novel compounds were generated using cheminformatics methods and scored with QSAR models which were combined to drive multi-objective optimisation. Using this approach, CNS-penetrant, selective dopamine D2 inverse agonists and compounds fitting a polypharmacological profile were designed. Schrödinger has pioneered large-scale cheminformatics and free energy simulation to drive lead optimisation. The discovery of the Malt-1 inhibitor SGR-1505 [59] used a computational pipeline involving the generation of 8 billion compounds through reaction-based enumeration, an Active Learning FEP protocol to generate a machine model that could triage large numbers of compounds before committing to full free energy simulation, followed by multiparameter optimisation using ML QSAR models. By using this intense computational process, the project needed only 10 months and 78 compounds synthesised to optimise to a clinical candidate [60].

ML has been applied to hit identification or virtual screening. The size of available ‘make to order’ libraries is becoming extremely large – over 10¹² compounds and growing – and searching them with traditional methods (pharmacophore searching, docking) is accordingly expensive. Klarich et al. [61] utilised an active learning approach called Thompson sampling to make the search process more efficient, needing to evaluate only 1% of the virtual library to find >50% of the known hits. The approach can be coupled with any type of screening method; they demonstrate 3D shape searching and docking. An alternative solution to this problem is the NGT (NeuralGenThesis) methods of Oliveira et al. [62]. NGT uses deep learning to project a 3 trillion compound vendor catalogue into a ‘latent space’ which has an associated decoder to regenerate chemical structures. The virtual screen can then iteratively sample promising compounds from the latent space, generate the structures via the decoder, and score them using, in this case, docking to a crystal structure of the activated receptor, an AlphaFold model and a homology model. The example given describes the identification of potent and selective inhibitors of the melanocortin-2 receptor.

More ambitious than searching in pre-defined chemical libraries is the de novo generation of hit molecules. Thomas et al. [63] utilised an LLM pre-trained on ChEMBL [64] with the goal of generating novel chemical structures with a low-energy docking score for seven known A2A protein crystal structures, alongside a variety of developability metrics such as logP, hydrogen bond donors and rotatable bonds. After extensive filtering, nine compounds were synthesised, yielding three nanomolar ligands with confirmed functional activity, two of which are novel chemotypes.

An emerging hit discovery strategy is to apply ML to screening data from DNA-encoded libraries and use the resulting model to predict activity in databases of commercially available compounds, thus saving the resource cost of off-DNA resynthesis. An example of this is the discovery of a low micromolar, first-in-class ligand for WDR91 [65], testing only 150 commercial compounds.

Biologics

Protein design is a younger discipline than its small molecule cousin [66]. It has its origins in protein engineering, where known proteins are mutated to gain information, to optimise a function, or repurpose the protein for another function. In this use case, the protein structure fold, stability and dynamics tend to be retained. This is not a trivial pursuit, demonstrated by the award of a Nobel Prize in 2018 [67]. In the last two decades, however, protein design has made extraordinary progress utilising both ‘physics-based’ structural modelling and of course Machine Learning [68], culminating in the award of its own Nobel Prize in 2024 [69]. AlphaFold [70], RosettaFold [44] and the evolutionary-scale LLM (ESM) family [71] are leading examples of these impressive new capabilities that are set to affect the design of enzymes, antibodies, vaccines, nanomachines and more [68]. These methods are built on the billions of publicly available sequences which sample diverse protein families and encode evolutionary constraints on the sequence–structure relationship. This is supplemented by >200,000 protein structures in the PDB [72].

AlphaFold successfully bridged the disciplines of bioinformatics, structural biology and ML by using multiple sequence alignments (MSA), patterns of conformations/interactions observed in protein crystal structures, and a deep learning architecture adopted from natural language processing [73]. AlphaFold3 was trained to predict not only protein structures but also biomolecular complexes of proteins, nucleic acids and their ligands (Figure 2). AlphaFold3 has an updated learning architecture to reduce dependency on the MSA and has introduced a diffusion model that creates the atomic co-ordinates of the models.

Figure 2: — (a) Bacterial CRP/FNR family transcriptional regulator protein bound to DNA and cGMP (PDB 7PZB). (b) Human coronavirus OC43 spike protein, 4665 residues, heavily glycosylated and bound by neutralising antibodies (PDB 7PNM). (c) AF3 performance on PoseBusters (v.1, August 2023 release), a recent PDB evaluation set and CASP15 RNA.(d) AF3 architecture for inference. The rectangles represent processing modules and the arrows show the data flow. Yellow, input data; blue, abstract network activations; green, output data. The coloured balls represent physical atom co-ordinates. Reproduced from reference (70) under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/)

RosettaFold builds on its protein-modelling heritage, utilising a residue-based presentation of amino acids and DNA bases, 1D sequences, 2D pairwise distance information from homologous proteins and 3D co-ordinate information as input to a deep learning architecture. The RoseTTAFold Diffusion method (RFDiffusion) [74] utilises a Diffusion Model to create the final atomic model.

The ESM family of models starts from a completely different area of ML – that of LLMs. ESM-2 is trained using over 65 million unique sequences, using a technique known as masked language modelling, whereby sequences in the training set have (in this case) a random 15% of amino acids ‘masked’, and the model is trained to predict them correctly. This strategy removes the need for sequence alignments. The sequence model is then passed to a folding model which benefits from a low-resolution picture of protein structure (such as residue-residue contact probabilities) that has been learnt by the LLM.

Successful applications of state-of-the-art protein design tools are impressive. The AlphaProteo design system [75] (based on AlphaFold) designed novel protein binders for eight diverse target proteins. Binders were experimentally verified for seven proteins, with affinities ranging from 80 pico-molar to low nano-molar. Two were tested for biological function, demonstrating inhibition of VEGF signalling in human cells and SARS-CoV-2 neutralisation in Vero monkey cells. Designed binder and binder-target complex structures were confirmed with X-ray crystallography and Cryo-EM.

RFDiffusion was able to design de novo protein binders for four protein targets: Influenza Haemagglutinin A, IL-7 Receptor-α, PD-L1 and TrkA receptor with Kd of 28 nM, 30 nM, 1.4 mM and 328 nM, respectively. In the same paper, de novo proteins with mixed alpha-beta topologies are designed, characterised with circular dichroism and their thermostability validated. Symmetric oligomers with unprecedented structures were designed, as were novel proteins designed to ‘scaffold’ known binding sites (e.g. the scaffolding of the p53 helix that binds MDM2) and enzyme active sites (e.g. a retroaldolase active site triad TYR1051-LYS1083-TYR1180).

The ESM LLM was used to affinity mature seven human immunoglobulin G (IgG) antibodies that bind to antigens from coronavirus, ebolavirus and influenza A virus representing diverse degrees of maturity. In each case, affinity was improved after creating 20 or fewer new variants of each antibody, across only two rounds of evolution. Although many of the suggested mutations would be considered common in nature, 5/32 affinity-enhancing mutations involved a rare or uncommon substitution. One surprising but effective substitution was that of a glycine in the wild-type (observed in 99% of natural antibody sequences) to a proline (observed in <1% of natural sequences).

‘One shot’ ML enabled de novo antibody design has been reported [76] using a model trained on known antibody-antigen complex structures. As validation of the method, the known product trastuzumab and its antigen HER2 were taken as a case study. Novel HCDR3 and HCDR123 sequences (diverse with respect to trastuzumab and each other) were generated from the model, which were validated using SPR, with 71 having affinities less than 10 nM. Three antibodies had a higher affinity for HER2 than trastuzumab.

LLMs as an orchestrator of experiments

No article would be complete without mentioning the integration of ML with experiment planning and execution. ChemCrow [77] and Coscientist [78] are LLM-based systems which design, plan and execute complex experiments. The user interface is the LLM and it is augmented with modules or agents which are designed for very specific tasks (e.g. web search, retrosynthetic analysis, structure to price, programming of liquid handlers). The LLM is able to take user instruction, e.g. ‘Find and synthesize a thiourea organocatalyst which accelerates a Diels-Alder reaction’, orchestrate the various tools to produce an answer and even create code to drive an automated synthesis platform. ChemCrow was able to design a new chromophore with a predicted maximum absorption wavelength of 369 nm and a two-step synthetic protocol from available starting materials. Coscientist was able to orchestrate iterative experiments to optimise conditions for both Suzuki coupling and Buchwald–Hartwig reactions (Figure 3).

Figure 3: — (a) Overview of Coscientist’s configuration. (b) Available compounds (DMF, dimethylformamide; DiPP, 2,6-diisopropylphenyl). (c) Liquid handler setup. (d) Solving the synthesis problem. (e) Comparison of reagent selection performance with a large dataset. (f) Comparison of reagent choices across multiple runs. (g) Overview of justifications made when selecting various aryl halides. (h) Frequency of visited URLs. (**I and j**) analytical data on the synthesised materials compared with pure standards. Reproduced from reference [78] under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/)

Are we there yet?

The above examples illustrate the potential that ML tools have to improve the rate of scientific discovery. However, these examples represent the state of the art, and publication bias (see later) is very real – these are published because they are successful. Perhaps, the best way to comment is to explore the limitations of the current tools.

First and foremost, ML relies on good data and preferably in large quantities. Where this exists, the resultant models can be impressive. But large high-quality scientific data are expensive to acquire: the cost of replacing the protein structure data in the PDB is conservatively estimated at $20 billion [79]. Data are the recurring issue, particularly for applications such as drug discovery where the application domain is always outside or on the edge of the training set [80]. Extrapolation is the requirement for useable ML models, and here, we are still struggling to understand what it is these models are actually learning. High-profile docking models were exposed as learning the data but no physics [81], whilst there are justifiable concerns on overfitting to errors in data [82]. Even AlphaFold3 has been shown to memorise conformations and not the physics which underpin them [83]. This potential for memorisation and lack of causal reasoning has led to a call to make AI be more scientific [84]. As one researcher noted [85], LLMs and other AI systems ‘lack the basic capacities for intersubjectivity, semantics and ontology that are preconditions for the kind of collaborative world-making that allows scientists to theorize, understand, innovate and discover’.

ML researchers rely on public domain benchmarks to judge the effectiveness of their new algorithms. It was the CASP (Critical Assessment of Structure Prediction) [86] challenges that enabled the revolution in protein structure prediction. There are no comparable benchmarks for real world drug discovery, and this remains a constraint on the field [87]. The literature is full of publication bias – there is no ‘Journal of Failed Chemical Reactions’.

Evolution or revolution in drug design?

There is no ignoring the impact of ML or denying the potential impact in the coming years. How will it benefit drug discovery? It will depend on the implementation because this is a disruptive technology – to get the best out of it requires business process re-engineering [88]. AI demands data. With the appropriate data, the ML drive design and discovery will perform well. Getting the right data quickly and cheaply is the challenge.

Biologics discovery will most probably be first to feel the benefits as much of the necessary experiments are largely automated, and the performance of the foundational models is impressive.

In small molecule discovery, we are likely to see a dual track adoption. On one hand, new companies are built around automated design (e.g. ExScientia, now merged with Recursion) in much the same way that companies were formed to pursue Structure Based Design (Vertex) and Fragment Based Design (Astex). More established companies will need to overcome the well-established “human centric” model [89] of the designer-maker medicinal chemist, which is not well placed to adopt the new approaches. Change management in this community can be a difficult business [90]. Indeed, McKinsey estimates that change management costs are three times the development of generative AI solutions [91]. But change will need to come, and it is not out of place to mention Kodak [92] as a cautionary tale at this point.

Moving ML models away from interpolation and towards extrapolation/reasoning and mechanistic thinking is necessary. AlphaFold3 has probably extracted as much out of current public domain data as is possible. A possible solution to both of these issues could be greater integration of physics-based models and simulation as a source of data [93].

What we do know is that the pace of change in the ML world is faster than any previous technology change we have witnessed. The next few years will see the growth in ‘lab in the loop’ [94,95] and even Autonomous Discovery [96–98] approaches as ML, informatics and experimental automation converge. We will remember this period as the time when a Revolution started. Drug design will look very different in the future, even if at the moment it is difficult to predict what the end state will look like.

Summary Points.

Machine learning (ML) in drug discovery builds on decades of innovation in bioinformatics, cheminformatics and computational chemistry.
ML adds significant capabilities to the computational toolbox, in some cases providing a significant leap in performance.
The literature is full of successful examples of ML-driven design in both small molecules and biologics.
Having the right data – both quality and quantity – is key to success.
This is a disruptive technology which will change how scientists work.

Abbreviations

AI: artificial intelligence
FEP: free energy perturbation
LLM: large language model
ML: machine learning
QSAR: quantitative structure activity relationship
RNN: recurrent neural network

Competing Interests

The authors declare that there are no competing interests associated with the manuscript.

References

1.Devereson, A., Sandler, C. and McKinsey, L. (2022) How AI could revolutionize drug discovery. https://www.mckinsey.com/industries/life-sciences/our-insights/how-ai-could-revolutionize-drug-discovery
2.Mikulic, M. (2023) Statista. Investment in leading AI-focused biotech companies worldwide in 2018-2022, by use case. https://www.statista.com/statistics/1428307/investment-in-ai-focused-biotech-companies-by-use-case/
3.Rawal, J. (2024) Artificial Intelligence (AI) in drug discovery market size. Fortune Business Insights. https://www.fortunebusinessinsights.com/artificial-intelligence-in-drug-discovery-market-105354
4.Temkin, S. (2024) Techcrunch. Xaira, an AI drug discovery startup, launches with a massive $1B, says it’s ‘ready’ to start developing drugs. https://techcrunch.com/2024/04/24/xaira-an-ai-drug-discovery-startup-launches-with-a-massive-1b-says-its-ready-to-start-developing-drugs
5.Scannell, J.W., Blanckley, A., Boldon, H. and Warrington, B. (2012) Diagnosing the decline in pharmaceutical R&D efficiency. Nat. Rev. Drug Discov. 11, 191–200 10.1038/nrd3681 [DOI] [PubMed] [Google Scholar]
6.Yang, X., Wang, Y., Byrne, R., Schneider, G. and Yang, S. (2019) Concepts of artificial intelligence for computer-assisted drug discovery. Chem. Rev. 119, 10520–10594 10.1021/acs.chemrev.8b00728 [DOI] [PubMed] [Google Scholar]
7.Yang, J., Cai, Y., Zhao, K., Xie, H. and Chen, X. (2022) Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov. Today 27, 103356 10.1016/j.drudis.2022.103356 [DOI] [PubMed] [Google Scholar]
8.Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2017) ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 10.1145/3065386 [DOI] [Google Scholar]
9.Weininger, D. (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 10.1021/ci00057a005 [DOI] [Google Scholar]
10.Nguyen, C.Q., Pertusi, D. and Branson, K.M. (2023) Molecule-morphology contrastive pretraining for transferable molecular representation [preprint arXiv:2305.09790]. arXiv. 10.48550/arXiv.2305.09790 [DOI]
11.Ho, J., Jain, A. and Abbeel, P. (2020) Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, Vol. 33 of, pp. 6840–6851 [Google Scholar]
12.Guo, Z., Liu, J., Wang, Y., Chen, M., Wang, D., Xu, D.et al. (2024) Diffusion models in bioinformatics and computational biology. Nat. Rev. Bioeng. 2, 136–154 10.1038/s44222-023-00114-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Igashov, I., Stärk, H., Vignac, C., Schneuing, A., Satorras, V.G., Frossard, P.et al. Equivariant 3D-conditional diffusion model for molecular linker design. Nat. Mach. Intell. 6, 417–427 10.1038/s42256-024-00815-9 [DOI] [Google Scholar]
14.Bjerrum, E.J. and Threlfall, R. (2017) Molecular generation with recurrent neural networks (RNNs) [preprint arXiv:1705.04612]. arXiv. 10.48550/arXiv.1705.04612 [DOI]
15.Van Houdt, G., Mosquera, C. and Nápoles, G. (2020) A review on the long short-term memory model. Artif. Intell. Rev. 53, 5929–5955 10.1007/s10462-020-09838-1 [DOI] [Google Scholar]
16.Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M.et al. (2023) A comprehensive overview of large language models [preprint arXiv:2307.06435]. arXiv. 10.48550/arXiv.2307.06435 [DOI]
17.Overton, C.E. (1901) Studien über die Narkose: zugleich ein Beitrag zur allgemeinen pharmakologie. G. Fischer [Google Scholar]
18.Meyer, H. (1901) Zur theorie der alkoholnarkose. Archiv f. experiment. Pathol. u. Pharmakol. 46, 338–346 10.1007/BF01978064 [DOI] [Google Scholar]
19.Hansch, C., Maloney, P.P., Fujita, T. and Muir, R.M. (1962) Correlation of biological activity of phenoxyacetic acids with hammett substituent constants and partition coefficients. Nature 194, 178–180 10.1038/194178b0 [DOI] [Google Scholar]
20.Cherkasov, A., Muratov, E.N., Fourches, D., Varnek, A., Baskin, I.I., Cronin, M.et al. (2014) QSAR modeling: where have you been? where are you going to? J. Med. Chem. 57, 4977–5010 10.1021/jm4004285 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Geladi, P. (1988) Notes on the history and nature of partial least squares (PLS) modelling. J. Chemom. 2, 231–246 10.1002/cem.1180020403 [DOI] [Google Scholar]
22.Gasteiger, J. and Zupan, J. (1993) Neural networks in chemistry. Angew. Chem. Int. Ed. Engl. 32, 503–527 10.1002/anie.199305031 [DOI] [Google Scholar]
23.Harper, G., Bradshaw, J., Gittins, J.C., Green, D.V. and Leach, A.R. (2001) Prediction of biological activity for high-throughput screening using binary kernel discrimination. J. Chem. Inf. Comput. Sci. 41, 1295–1300 10.1021/ci000397q [DOI] [PubMed] [Google Scholar]
24.Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P. and Feuston, B.P. (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958 10.1021/ci034160g [DOI] [PubMed] [Google Scholar]
25.Warmuth, M.K., Liao, J., Rätsch, G., Mathieson, M., Putta, S. and Lemmen, C. (2003) Active learning with support vector machines in the drug discovery process. J. Chem. Inf. Comput. Sci. 43, 667–673 10.1021/ci025620t [DOI] [PubMed] [Google Scholar]
26.Darvas, F. (1974) Application of the sequential simplex method in designing drug analogs. J. Med. Chem. 17, 799–804 10.1021/jm00254a004 [DOI] [PubMed] [Google Scholar]
27.Hodgkin, E.E. The Castlemaine project: development of an AI-based drug design support system. In In Molecular Modelling and Drug Design, pp. 137–169, Palgrave, 10.1007/978-1-349-12973-7_4 [DOI] [Google Scholar]
28.Gillet, V.J., Willett, P., Fleming, P.J. and Green, D.V.S. (2002) Designing focused libraries using MoSELECT. J. Mol. Graph. Model. 20, 491–498 10.1016/s1093-3263(01)00150-4 [DOI] [PubMed] [Google Scholar]
29.Nicolaou, C.A., Brown, N. and Pattichis, C.S. (2007) Molecular optimization using computational multi-objective methods. Curr. Opin. Drug Discov. Devel. 10, 316–324 [PubMed] [Google Scholar]
30.Cartmell, J., Enoch, S., Krstajic, D. and Leahy, D.E. (2005) Automated QSPR through competitive workflow. J. Comput. Aided Mol. Des. 19, 821–833 10.1007/s10822-005-9029-8 [DOI] [PubMed] [Google Scholar]
31.Cox, R., Green, D.V.S., Luscombe, C.N., Malcolm, N. and Pickett, S.D. (2013) QSAR workbench: automating QSAR modeling to drive compound design. J. Comput. Aided Mol. Des. 27, 321–336 10.1007/s10822-013-9648-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Athanasiou, C. and Cournia, Z. From computers to bedside: computational chemistry contributing to FDA approval. Biomolecular Simulations in Structure-Based Drug Discovery 2018, 163–203 10.1002/9783527806836 [DOI] [Google Scholar]
33.Schultz, T.W., Diderich, R., Kuseva, C.D. and Mekenyan, O.G. The OECD QSAR toolbox starts its second decade. Computational Toxicology: Methods and Protocols 2018, 55–77 10.1007/978-1-4939-7899-1_2 [DOI] [PubMed] [Google Scholar]
34.Griffen, E., Leach, A.G., Robb, G.R. and Warner, D.J. (2011) Matched molecular pairs as a medicinal chemistry tool. J. Med. Chem. 54, 7739–7750 10.1021/jm200452d [DOI] [PubMed] [Google Scholar]
35.Hussain, J. and Rea, C. (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J. Chem. Inf. Model. 50, 339–348 10.1021/ci900450m [DOI] [PubMed] [Google Scholar]
36.Ehmki, E.S.R. and Kramer, C. (2017) Matched molecular series: measuring SAR similarity. J. Chem. Inf. Model. 57, 1187–1196 10.1021/acs.jcim.6b00709 [DOI] [PubMed] [Google Scholar]
37.Kandasamy, K., Vysyaraju, K.R., Neiswanger, W., Paria, B., Collins, C.R., Schneider, J.et al. (2020) Tuning hyperparameters without grad students: Scalable and robust bayesian optimisation with dragonfly. 21, 1–27 Journal of machine learning research: JMLR.34305477 [Google Scholar]
38.Grygorenko, O.O., Radchenko, D.S., Dziuba, I., Chuprina, A., Gubina, K.E. and Moroz, Y.S. (2020) Generating multibillion chemical space of readily accessible screening compounds. iScience 23, 101681 10.1016/j.isci.2020.101681 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Abel, R., Wang, L., Harder, E.D., Berne, B.J. and Friesner, R.A. (2017) Advancing drug discovery through enhanced free energy calculations. Acc. Chem. Res. 50, 1625–1632 10.1021/acs.accounts.7b00083 [DOI] [PubMed] [Google Scholar]
40.Tropsha, A., Isayev, O., Varnek, A., Schneider, G. and Cherkasov, A. (2024) Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat. Rev. Drug Discov. 23, 141–155 10.1038/s41573-023-00832-0 [DOI] [PubMed] [Google Scholar]
41.Du, Y., Jamasb, A.R., Guo, J., Fu, T., Harris, C., Wang, Y.et al. (2024) Machine learning-aided generative molecular design. Nat. Mach. Intell. 6, 589–604 10.1038/s42256-024-00843-5 [DOI] [Google Scholar]
42.Smith, J.S., Isayev, O. and Roitberg, A.E. (2017) ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 10.1039/c6sc05720a [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O.et al. (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Krishna, R., Wang, J., Ahern, W., Sturmfels, P., Venkatesh, P., Kalvet, I.et al. (2024) Generalized biomolecular modeling and design with RoseTTAFold All-atom. Science 384, eadl2528 10.1126/science.adl2528 [DOI] [PubMed] [Google Scholar]
45.Ramsundar, B., Liu, B., Wu, Z., Verras, A., Tudor, M., Sheridan, R.P.et al. (2017) Is multitask deep learning practical for pharma? J. Chem. Inf. Model. 57, 2068–2076 10.1021/acs.jcim.7b00146 [DOI] [PubMed] [Google Scholar]
46.Kaufman, B., Williams, E.C., Underkoffler, C., Pederson, R., Mardirossian, N., Watson, I.et al. (2024) COATI: multimodal contrastive pretraining for representing and traversing chemical space. J. Chem. Inf. Model. 64, 1145–1157 10.1021/acs.jcim.3c01753 [DOI] [PubMed] [Google Scholar]
47.Segler, M.H.S., Kogej, T., Tyrchan, C. and Waller, M.P. (2018) Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 10.1021/acscentsci.7b00512 [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Ahmad, W., Simon, E., Chithrananda, S., Grand, G. and Ramsundar, B. Chemberta-2: towards chemical foundation models [preprint arXiv:2209.01712]. arXiv. 10.48550/arXiv.2209.01712 [DOI]
49.Madani, A., Krause, B., Greene, E.R., Subramanian, S., Mohr, B.P., Holton, J.M.et al. (2023) Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 10.1038/s41587-022-01618-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Fluetsch, A., Di Lascio, E., Gerebtzoff, G. and Rodríguez-Pérez, R. (2024) Adapting deep learning QSPR models to specific drug discovery projects. Mol. Pharm. 21, 1817–1826 10.1021/acs.molpharmaceut.3c01124 [DOI] [PubMed] [Google Scholar]
51.King-Smith, E. (2024) Transfer learning for a foundational chemistry model. Chem. Sci. 15, 5143–5151 10.1039/d3sc04928k [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Bassani, D., Brigo, A. and Andrews-Morger, A. (2023) Federated learning in computational toxicology: an industrial perspective on the effiris hackathon. Chem. Res. Toxicol. 36, 1503–1517 10.1021/acs.chemrestox.3c00137 [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Heyndrickx, W., Mervin, L., Morawietz, T., Sturm, N., Friedrich, L., Zalewski, A.et al. (2024) MELLODDY: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. J. Chem. Inf. Model. 64, 2331–2344 10.1021/acs.jcim.3c00799 [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Zhavoronkov, A., Ivanenkov, Y.A., Aliper, A., Veselov, M.S., Aladinskiy, V.A., Aladinskaya, A.V.et al. (2019) Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 10.1038/s41587-019-0224-x [DOI] [PubMed] [Google Scholar]
55.Yoshimori, A., Asawa, Y., Kawasaki, E., Tasaka, T., Matsuda, S., Sekikawa, T.et al. (2021) Design and synthesis of DDR1 inhibitors with a desired pharmacophore using deep generative models. ChemMedChem 16, 955–958 10.1002/cmdc.202000786 [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Grisoni, F., Huisman, B.J.H., Button, A.L., Moret, M., Atz, K., Merk, D.et al. (2021) Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Sci. Adv. 7, eabg3338 10.1126/sciadv.abg3338 [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Garnsey, M.R., Robinson, M.C., Nguyen, L.T., Cardin, R., Tillotson, J., Mashalidis, E.et al. (2024) Discovery of SARS-CoV-2 papain-like protease (PLpro) inhibitors with efficacy in a murine infection model. Sci. Adv. 10, eado4288 10.1126/sciadv.ado4288 [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Besnard, J., Ruda, G.F., Setola, V., Abecassis, K., Rodriguiz, R.M., Huang, X.P.et al. (2012) Automated design of ligands to polypharmacological profiles. Nature 492, 215–220 10.1038/nature11691 [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Olszewski, A., Kahn, D., Yoo, B., Tan, J.B., Gupta, V.K., Schuck, E.et al. (2023) A Phase 1, open-label, multicenter, dose-escalation study of Sgr-1505 as monotherapy in subjects with mature B-cell malignancies. Blood 142, 3102–3102 10.1182/blood-2023-182838 [DOI] [Google Scholar]
60. Nie, Z. and Schrodinger Inc . Hit to development candidate in 10 months: Rapid discovery of a novel, potent MALT1 inhibitor. https://www.schrodinger.com/life-science/learn/case-studies/hit-development-candidate-10-months-rapid-discovery-novel-potent-malt1-inhibitor/
61.Klarich, K., Goldman, B., Kramer, T., Riley, P. and Walters, W.P. (2024) Thompson sampling─an efficient method for searching ultralarge synthesis on demand databases. J. Chem. Inf. Model. 64, 1158–1171 10.1021/acs.jcim.3c01790 [DOI] [PMC free article] [PubMed] [Google Scholar]
62.de Oliveira, S., Pedawi, A., Kenyon, V., van den Bedem, H.. (2024) NGT: generative AI with synthesizability guarantees identifies potent inhibitors for a G-protein associated melanocortin receptor in a tera-scale vHTS screen. ChemRxiv. 10.26434/chemrxiv-2024-fz37h-v3 [DOI]
63.Thomas, M., Matricon, P.G., Gillespie, R.J., Napiórkowska, M., Neale, H., Mason, J.S.et al. (2024) Modern hit-finding with structure-guided de novo design: identification of novel nanomolar adenosine A2A receptor ligands using reinforcement learning. ChemRxiv. 10.26434/chemrxiv-2024-wh7zw-v2 [DOI] [PMC free article] [PubMed]
64.Mendez, D., Gaulton, A., Bento, A.P., Chambers, J., Veij, M., Félix, E.et al. (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47, D930–40 10.1093/nar/gky1075 [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Ahmad, S., Xu, J., Feng, J.A., Hutchinson, A., Zeng, H., Ghiabi, P.et al. (2023) Discovery of a first-in-class small-molecule ligand for WDR91 using DNA-encoded chemical library selection followed by machine learning. J. Med. Chem. 66, 16051–16061 10.1021/acs.jmedchem.3c01471 [DOI] [PubMed] [Google Scholar]
66.Woolfson, D.N. (2021) A brief history of de novo protein design: minimal, rational, and computational. J. Mol. Biol. 433, 167160 10.1016/j.jmb.2021.167160 [DOI] [PubMed] [Google Scholar]
67. NobelPrize.org . The nobel prize in chemistry 2018. https://www.nobelprize.org/prizes/chemistry/2018/summary/
68.Notin, P., Rollins, N., Gal, Y., Sander, C. and Marks, D. (2024) Machine learning for functional protein design. Nat. Biotechnol. 42, 216–228 10.1038/s41587-024-02127-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Callaway, E. (2024) Chemistry nobel goes to developers of alphafold AI that predicts protein structures. Nature New Biol. 634, 525–526 10.1038/d41586-024-03214-7 [DOI] [PubMed] [Google Scholar]
70.Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A.et al. (2024) Accurate structure prediction of biomolecular interactions with alphafold 3. Nature 630, 493–500 10.1038/s41586-024-07487-w [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W.et al. (2023) Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 10.1126/science.ade2574 [DOI] [PubMed] [Google Scholar]
72.Berman, H., Henrick, K., Nakamura, H. and Markley, J.L. (2007) The worldwide protein data bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 35, D301–3 10.1093/nar/gkl971 [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.et al. (2017) Attention is all you need. Adv. Neural Inf. Process. Syst. 2017-December, 5999–6009 [Google Scholar]
74.Watson, J.L., Juergens, D., Bennett, N.R., Trippe, B.L., Yim, J., Eisenach, H.E.et al. (2023) De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 10.1038/s41586-023-06415-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
75.Zambaldi, V., La, D., Chu, A.E., Patani, H., Danson, A.E., Kwan, T.O.et al. (2024) De novo design of high-affinity protein binders with alphaproteo [arXiv:2409.08022]. arXiv. 10.48550/arXiv.2409.08022 [DOI]
76.Shanehsazzadeh, A., McPartlon, M., Kasun, G., Steiger, A.K., Sutton, J.M., Yassine, E.et al. Unlocking de novo antibody design with generative artificial intelligence. 2023, 2023–01 bioRxiv. 10.1101/2023.01.08.523187 [DOI] [Google Scholar]
77.M Bran, A., Cox, S., Schilter, O., Baldassari, C., White, A.D. and Schwaller, P. (2024) Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 10.1038/s42256-024-00832-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
78.Boiko, D.A., MacKnight, R., Kline, B. and Gomes, G. (2023) Autonomous chemical research with large language models. Nature 624, 570–578 10.1038/s41586-023-06792-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
79.Burley, S.K., Bhikadiya, C., Bi, C., Bittrich, S., Chao, H., Chen, L.et al. (2023) RCSB protein data bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res. 51, D488–D508 10.1093/nar/gkac1077 [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Durant, G., Boyles, F., Birchall, K. and Deane, C.M. (2024) The future of machine learning for small-molecule drug discovery will be driven by data. Nat. Comput. Sci. 4, 1–9 10.1038/s43588-024-00699-0 [DOI] [PubMed] [Google Scholar]
81.Buttenschoen, M., Morris, G.M. and Deane, C.M. (2024) Posebusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chem. Sci. 15, 3130–3139 10.1039/d3sc04185a [DOI] [PMC free article] [PubMed] [Google Scholar]
82.Crusius, D., Cipcigan, F. and Biggin, P.C. (2025) Are we fitting data or noise? Analysing the predictive power of commonly used datasets in drug-, materials-, and molecular-discovery. Faraday Discuss. 256, 304–321 10.1039/d4fd00091a [DOI] [PubMed] [Google Scholar]
83.Chakravarty, D., Schafer, J.W., Chen, E.A., Thole, J.F., Ronish, L.A., Lee, M.et al. (2024) AlphaFold predictions of fold-switched conformations are driven by structure memorization. Nat. Commun. 15, 7296 10.1038/s41467-024-51801-z [DOI] [PMC free article] [PubMed] [Google Scholar]
84.Coveney, P.V. and Highfield, R. (2024) Artificial intelligence must be made more scientific. J. Chem. Inf. Model. 64, 5739–5741 10.1021/acs.jcim.4c01091 [DOI] [PMC free article] [PubMed] [Google Scholar]
85.Birhane, A., Kasirzadeh, A., Leslie, D. and Wachter, S. (2023) Science in the age of large language models. Nat. Rev. Phys. 5, 277–280 10.1038/s42254-023-00581-4 [DOI] [Google Scholar]
86.Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. and Moult, J. (2023) Critical assessment of methods of protein structure prediction (CASP)-round XV. Proteins 91, 1539–1549 10.1002/prot.26617 [DOI] [PMC free article] [PubMed] [Google Scholar]
87.Wognum, C., Ash, J.R., Aldeghi, M., Rodríguez-Pérez, R., Fang, C., Cheng, A.C.et al. (2024) A call for an industry-led initiative to critically assess machine learning for real-world drug discovery. Nat. Mach. Intell. 6, 1120–1121 10.1038/s42256-024-00911-w [DOI] [Google Scholar]
88.Rosemann, M., Brocke, J. vom, Van Looy, A. and Santoro, F. (2024) Business process management in the age of AI – three essential drifts. Inf. Syst. E-Bus. Manage. 22, 1–15 10.1007/s10257-024-00689-9 [DOI] [Google Scholar]
89.Green, DVS. (2019) Using machine learning to inform decisions in drug discovery: an industry perspective. In Machine Learning in Chemistry. ACS Symposium Series (Pyzer-Knapp, E., ed), pp. 81–101. 10.1021/bk-2019-1326.ch005 [DOI] [Google Scholar]
90.Oprea, T.I. and Weininger, D. (2024) Rethinking medicinal chemistry in the cheminformatics age. J. Med. Chem. 67, 17935–17939 10.1021/acs.jmedchem.4c02179 [DOI] [PubMed] [Google Scholar]
91. McKinsey & Company . (2024) Moving past gen AI’s honeymoon phase: seven hard truths for CIOs to get from pilot to scale. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/moving-past-gen-ais-honeymoon-phase-seven-hard-truths-for-cios-to-get-from-pilot-to-scale
92.Lucas, H.C. and Goh, J.M. (2009) Disruptive technology: how kodak missed the digital photography revolution. The Journal of Strategic Information Systems 18, 46–55 10.1016/j.jsis.2009.01.002 [DOI] [Google Scholar]
93.Frasnetti, E., Cucchi, I., Pavoni, S., Frigerio, F., Cinquini, F., Serapian, S.A.et al. (2024) Integrating molecular dynamics and machine learning algorithms to predict the functional profile of kinase ligands. J. Chem. Theory Comput. 20, 9209–9229 10.1021/acs.jctc.4c01097 [DOI] [PubMed] [Google Scholar]
94.Zenil, H., Tegnér, J., Abrahão, F.S., Lavin, A., Kumar, V., Frey, J.G.et al. The future of fundamental science led by generative closed-loop artificial intelligence [preprint arXiv:2307.07522. 2023]. arXiv. 10.48550/arXiv.2307.07522 [DOI]
95.Saikin, S.K., Kreisbeck, C., Sheberla, D., Becker, J.S. and Aspuru-Guzik, A.. (2019) Closed-loop discovery platform integration is needed for artificial intelligence to make an impact in drug discovery. Expert Opin. Drug Discov. 14, 1–4 10.1080/17460441.2019.1546690 [DOI] [PubMed] [Google Scholar]
96.Coley, C.W., Eyke, N.S. and Jensen, K.F. (2020) Autonomous discovery in the chemical sciences part I: progress. Angew. Chem. Int. Ed. Engl. 59, 22858–22893 10.1002/anie.201909987 [DOI] [PubMed] [Google Scholar]
97.Coley, C.W., Eyke, N.S. and Jensen, K.F. (2020) Autonomous discovery in the chemical sciences part II: outlook. Angew. Chem. Int. Ed. 59, 23414–23436 10.1002/anie.201909989 [DOI] [PubMed] [Google Scholar]
98.Sparkes, A., Aubrey, W., Byrne, E., Clare, A., Khan, M.N., Liakata, M.et al. (2010) Towards robot scientists for autonomous scientific discovery. Autom. Exp. 2, 1–1 10.1186/1759-4499-2-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C1] 1.Devereson, A., Sandler, C. and McKinsey, L. (2022) How AI could revolutionize drug discovery. https://www.mckinsey.com/industries/life-sciences/our-insights/how-ai-could-revolutionize-drug-discovery

[etls-ETLS20240005C2] 2.Mikulic, M. (2023) Statista. Investment in leading AI-focused biotech companies worldwide in 2018-2022, by use case. https://www.statista.com/statistics/1428307/investment-in-ai-focused-biotech-companies-by-use-case/

[etls-ETLS20240005C3] 3.Rawal, J. (2024) Artificial Intelligence (AI) in drug discovery market size. Fortune Business Insights. https://www.fortunebusinessinsights.com/artificial-intelligence-in-drug-discovery-market-105354

[etls-ETLS20240005C4] 4.Temkin, S. (2024) Techcrunch. Xaira, an AI drug discovery startup, launches with a massive $1B, says it’s ‘ready’ to start developing drugs. https://techcrunch.com/2024/04/24/xaira-an-ai-drug-discovery-startup-launches-with-a-massive-1b-says-its-ready-to-start-developing-drugs

[etls-ETLS20240005C5] 5.Scannell, J.W., Blanckley, A., Boldon, H. and Warrington, B. (2012) Diagnosing the decline in pharmaceutical R&D efficiency. Nat. Rev. Drug Discov. 11, 191–200 10.1038/nrd3681 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C6] 6.Yang, X., Wang, Y., Byrne, R., Schneider, G. and Yang, S. (2019) Concepts of artificial intelligence for computer-assisted drug discovery. Chem. Rev. 119, 10520–10594 10.1021/acs.chemrev.8b00728 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C7] 7.Yang, J., Cai, Y., Zhao, K., Xie, H. and Chen, X. (2022) Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov. Today 27, 103356 10.1016/j.drudis.2022.103356 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C8] 8.Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2017) ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 10.1145/3065386 [DOI] [Google Scholar]

[etls-ETLS20240005C9] 9.Weininger, D. (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 10.1021/ci00057a005 [DOI] [Google Scholar]

[etls-ETLS20240005C10] 10.Nguyen, C.Q., Pertusi, D. and Branson, K.M. (2023) Molecule-morphology contrastive pretraining for transferable molecular representation [preprint arXiv:2305.09790]. arXiv. 10.48550/arXiv.2305.09790 [DOI]

[etls-ETLS20240005C11] 11.Ho, J., Jain, A. and Abbeel, P. (2020) Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, Vol. 33 of, pp. 6840–6851 [Google Scholar]

[etls-ETLS20240005C12] 12.Guo, Z., Liu, J., Wang, Y., Chen, M., Wang, D., Xu, D.et al. (2024) Diffusion models in bioinformatics and computational biology. Nat. Rev. Bioeng. 2, 136–154 10.1038/s44222-023-00114-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C13] 13.Igashov, I., Stärk, H., Vignac, C., Schneuing, A., Satorras, V.G., Frossard, P.et al. Equivariant 3D-conditional diffusion model for molecular linker design. Nat. Mach. Intell. 6, 417–427 10.1038/s42256-024-00815-9 [DOI] [Google Scholar]

[etls-ETLS20240005C14] 14.Bjerrum, E.J. and Threlfall, R. (2017) Molecular generation with recurrent neural networks (RNNs) [preprint arXiv:1705.04612]. arXiv. 10.48550/arXiv.1705.04612 [DOI]

[etls-ETLS20240005C15] 15.Van Houdt, G., Mosquera, C. and Nápoles, G. (2020) A review on the long short-term memory model. Artif. Intell. Rev. 53, 5929–5955 10.1007/s10462-020-09838-1 [DOI] [Google Scholar]

[etls-ETLS20240005C16] 16.Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M.et al. (2023) A comprehensive overview of large language models [preprint arXiv:2307.06435]. arXiv. 10.48550/arXiv.2307.06435 [DOI]

[etls-ETLS20240005C17] 17.Overton, C.E. (1901) Studien über die Narkose: zugleich ein Beitrag zur allgemeinen pharmakologie. G. Fischer [Google Scholar]

[etls-ETLS20240005C18] 18.Meyer, H. (1901) Zur theorie der alkoholnarkose. Archiv f. experiment. Pathol. u. Pharmakol. 46, 338–346 10.1007/BF01978064 [DOI] [Google Scholar]

[etls-ETLS20240005C19] 19.Hansch, C., Maloney, P.P., Fujita, T. and Muir, R.M. (1962) Correlation of biological activity of phenoxyacetic acids with hammett substituent constants and partition coefficients. Nature 194, 178–180 10.1038/194178b0 [DOI] [Google Scholar]

[etls-ETLS20240005C20] 20.Cherkasov, A., Muratov, E.N., Fourches, D., Varnek, A., Baskin, I.I., Cronin, M.et al. (2014) QSAR modeling: where have you been? where are you going to? J. Med. Chem. 57, 4977–5010 10.1021/jm4004285 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C21] 21.Geladi, P. (1988) Notes on the history and nature of partial least squares (PLS) modelling. J. Chemom. 2, 231–246 10.1002/cem.1180020403 [DOI] [Google Scholar]

[etls-ETLS20240005C22] 22.Gasteiger, J. and Zupan, J. (1993) Neural networks in chemistry. Angew. Chem. Int. Ed. Engl. 32, 503–527 10.1002/anie.199305031 [DOI] [Google Scholar]

[etls-ETLS20240005C23] 23.Harper, G., Bradshaw, J., Gittins, J.C., Green, D.V. and Leach, A.R. (2001) Prediction of biological activity for high-throughput screening using binary kernel discrimination. J. Chem. Inf. Comput. Sci. 41, 1295–1300 10.1021/ci000397q [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C24] 24.Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P. and Feuston, B.P. (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958 10.1021/ci034160g [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C25] 25.Warmuth, M.K., Liao, J., Rätsch, G., Mathieson, M., Putta, S. and Lemmen, C. (2003) Active learning with support vector machines in the drug discovery process. J. Chem. Inf. Comput. Sci. 43, 667–673 10.1021/ci025620t [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C26] 26.Darvas, F. (1974) Application of the sequential simplex method in designing drug analogs. J. Med. Chem. 17, 799–804 10.1021/jm00254a004 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C27] 27.Hodgkin, E.E. The Castlemaine project: development of an AI-based drug design support system. In In Molecular Modelling and Drug Design, pp. 137–169, Palgrave, 10.1007/978-1-349-12973-7_4 [DOI] [Google Scholar]

[etls-ETLS20240005C28] 28.Gillet, V.J., Willett, P., Fleming, P.J. and Green, D.V.S. (2002) Designing focused libraries using MoSELECT. J. Mol. Graph. Model. 20, 491–498 10.1016/s1093-3263(01)00150-4 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C29] 29.Nicolaou, C.A., Brown, N. and Pattichis, C.S. (2007) Molecular optimization using computational multi-objective methods. Curr. Opin. Drug Discov. Devel. 10, 316–324 [PubMed] [Google Scholar]

[etls-ETLS20240005C30] 30.Cartmell, J., Enoch, S., Krstajic, D. and Leahy, D.E. (2005) Automated QSPR through competitive workflow. J. Comput. Aided Mol. Des. 19, 821–833 10.1007/s10822-005-9029-8 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C31] 31.Cox, R., Green, D.V.S., Luscombe, C.N., Malcolm, N. and Pickett, S.D. (2013) QSAR workbench: automating QSAR modeling to drive compound design. J. Comput. Aided Mol. Des. 27, 321–336 10.1007/s10822-013-9648-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C32] 32.Athanasiou, C. and Cournia, Z. From computers to bedside: computational chemistry contributing to FDA approval. Biomolecular Simulations in Structure-Based Drug Discovery 2018, 163–203 10.1002/9783527806836 [DOI] [Google Scholar]

[etls-ETLS20240005C33] 33.Schultz, T.W., Diderich, R., Kuseva, C.D. and Mekenyan, O.G. The OECD QSAR toolbox starts its second decade. Computational Toxicology: Methods and Protocols 2018, 55–77 10.1007/978-1-4939-7899-1_2 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C34] 34.Griffen, E., Leach, A.G., Robb, G.R. and Warner, D.J. (2011) Matched molecular pairs as a medicinal chemistry tool. J. Med. Chem. 54, 7739–7750 10.1021/jm200452d [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C35] 35.Hussain, J. and Rea, C. (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J. Chem. Inf. Model. 50, 339–348 10.1021/ci900450m [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C36] 36.Ehmki, E.S.R. and Kramer, C. (2017) Matched molecular series: measuring SAR similarity. J. Chem. Inf. Model. 57, 1187–1196 10.1021/acs.jcim.6b00709 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C37] 37.Kandasamy, K., Vysyaraju, K.R., Neiswanger, W., Paria, B., Collins, C.R., Schneider, J.et al. (2020) Tuning hyperparameters without grad students: Scalable and robust bayesian optimisation with dragonfly. 21, 1–27 Journal of machine learning research: JMLR.34305477 [Google Scholar]

[etls-ETLS20240005C38] 38.Grygorenko, O.O., Radchenko, D.S., Dziuba, I., Chuprina, A., Gubina, K.E. and Moroz, Y.S. (2020) Generating multibillion chemical space of readily accessible screening compounds. iScience 23, 101681 10.1016/j.isci.2020.101681 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C39] 39.Abel, R., Wang, L., Harder, E.D., Berne, B.J. and Friesner, R.A. (2017) Advancing drug discovery through enhanced free energy calculations. Acc. Chem. Res. 50, 1625–1632 10.1021/acs.accounts.7b00083 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C40] 40.Tropsha, A., Isayev, O., Varnek, A., Schneider, G. and Cherkasov, A. (2024) Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat. Rev. Drug Discov. 23, 141–155 10.1038/s41573-023-00832-0 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C41] 41.Du, Y., Jamasb, A.R., Guo, J., Fu, T., Harris, C., Wang, Y.et al. (2024) Machine learning-aided generative molecular design. Nat. Mach. Intell. 6, 589–604 10.1038/s42256-024-00843-5 [DOI] [Google Scholar]

[etls-ETLS20240005C42] 42.Smith, J.S., Isayev, O. and Roitberg, A.E. (2017) ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 10.1039/c6sc05720a [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C43] 43.Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O.et al. (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C44] 44.Krishna, R., Wang, J., Ahern, W., Sturmfels, P., Venkatesh, P., Kalvet, I.et al. (2024) Generalized biomolecular modeling and design with RoseTTAFold All-atom. Science 384, eadl2528 10.1126/science.adl2528 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C45] 45.Ramsundar, B., Liu, B., Wu, Z., Verras, A., Tudor, M., Sheridan, R.P.et al. (2017) Is multitask deep learning practical for pharma? J. Chem. Inf. Model. 57, 2068–2076 10.1021/acs.jcim.7b00146 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C46] 46.Kaufman, B., Williams, E.C., Underkoffler, C., Pederson, R., Mardirossian, N., Watson, I.et al. (2024) COATI: multimodal contrastive pretraining for representing and traversing chemical space. J. Chem. Inf. Model. 64, 1145–1157 10.1021/acs.jcim.3c01753 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C47] 47.Segler, M.H.S., Kogej, T., Tyrchan, C. and Waller, M.P. (2018) Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 10.1021/acscentsci.7b00512 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C48] 48.Ahmad, W., Simon, E., Chithrananda, S., Grand, G. and Ramsundar, B. Chemberta-2: towards chemical foundation models [preprint arXiv:2209.01712]. arXiv. 10.48550/arXiv.2209.01712 [DOI]

[etls-ETLS20240005C49] 49.Madani, A., Krause, B., Greene, E.R., Subramanian, S., Mohr, B.P., Holton, J.M.et al. (2023) Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 10.1038/s41587-022-01618-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C50] 50.Fluetsch, A., Di Lascio, E., Gerebtzoff, G. and Rodríguez-Pérez, R. (2024) Adapting deep learning QSPR models to specific drug discovery projects. Mol. Pharm. 21, 1817–1826 10.1021/acs.molpharmaceut.3c01124 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C51] 51.King-Smith, E. (2024) Transfer learning for a foundational chemistry model. Chem. Sci. 15, 5143–5151 10.1039/d3sc04928k [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C52] 52.Bassani, D., Brigo, A. and Andrews-Morger, A. (2023) Federated learning in computational toxicology: an industrial perspective on the effiris hackathon. Chem. Res. Toxicol. 36, 1503–1517 10.1021/acs.chemrestox.3c00137 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C53] 53.Heyndrickx, W., Mervin, L., Morawietz, T., Sturm, N., Friedrich, L., Zalewski, A.et al. (2024) MELLODDY: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. J. Chem. Inf. Model. 64, 2331–2344 10.1021/acs.jcim.3c00799 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C54] 54.Zhavoronkov, A., Ivanenkov, Y.A., Aliper, A., Veselov, M.S., Aladinskiy, V.A., Aladinskaya, A.V.et al. (2019) Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 10.1038/s41587-019-0224-x [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C55] 55.Yoshimori, A., Asawa, Y., Kawasaki, E., Tasaka, T., Matsuda, S., Sekikawa, T.et al. (2021) Design and synthesis of DDR1 inhibitors with a desired pharmacophore using deep generative models. ChemMedChem 16, 955–958 10.1002/cmdc.202000786 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C56] 56.Grisoni, F., Huisman, B.J.H., Button, A.L., Moret, M., Atz, K., Merk, D.et al. (2021) Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Sci. Adv. 7, eabg3338 10.1126/sciadv.abg3338 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C57] 57.Garnsey, M.R., Robinson, M.C., Nguyen, L.T., Cardin, R., Tillotson, J., Mashalidis, E.et al. (2024) Discovery of SARS-CoV-2 papain-like protease (PLpro) inhibitors with efficacy in a murine infection model. Sci. Adv. 10, eado4288 10.1126/sciadv.ado4288 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C58] 58.Besnard, J., Ruda, G.F., Setola, V., Abecassis, K., Rodriguiz, R.M., Huang, X.P.et al. (2012) Automated design of ligands to polypharmacological profiles. Nature 492, 215–220 10.1038/nature11691 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C59] 59.Olszewski, A., Kahn, D., Yoo, B., Tan, J.B., Gupta, V.K., Schuck, E.et al. (2023) A Phase 1, open-label, multicenter, dose-escalation study of Sgr-1505 as monotherapy in subjects with mature B-cell malignancies. Blood 142, 3102–3102 10.1182/blood-2023-182838 [DOI] [Google Scholar]

[etls-ETLS20240005C60] 60. Nie, Z. and Schrodinger Inc . Hit to development candidate in 10 months: Rapid discovery of a novel, potent MALT1 inhibitor. https://www.schrodinger.com/life-science/learn/case-studies/hit-development-candidate-10-months-rapid-discovery-novel-potent-malt1-inhibitor/

[etls-ETLS20240005C61] 61.Klarich, K., Goldman, B., Kramer, T., Riley, P. and Walters, W.P. (2024) Thompson sampling─an efficient method for searching ultralarge synthesis on demand databases. J. Chem. Inf. Model. 64, 1158–1171 10.1021/acs.jcim.3c01790 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C62] 62.de Oliveira, S., Pedawi, A., Kenyon, V., van den Bedem, H.. (2024) NGT: generative AI with synthesizability guarantees identifies potent inhibitors for a G-protein associated melanocortin receptor in a tera-scale vHTS screen. ChemRxiv. 10.26434/chemrxiv-2024-fz37h-v3 [DOI]

[etls-ETLS20240005C63] 63.Thomas, M., Matricon, P.G., Gillespie, R.J., Napiórkowska, M., Neale, H., Mason, J.S.et al. (2024) Modern hit-finding with structure-guided de novo design: identification of novel nanomolar adenosine A2A receptor ligands using reinforcement learning. ChemRxiv. 10.26434/chemrxiv-2024-wh7zw-v2 [DOI] [PMC free article] [PubMed]

[etls-ETLS20240005C64] 64.Mendez, D., Gaulton, A., Bento, A.P., Chambers, J., Veij, M., Félix, E.et al. (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47, D930–40 10.1093/nar/gky1075 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C65] 65.Ahmad, S., Xu, J., Feng, J.A., Hutchinson, A., Zeng, H., Ghiabi, P.et al. (2023) Discovery of a first-in-class small-molecule ligand for WDR91 using DNA-encoded chemical library selection followed by machine learning. J. Med. Chem. 66, 16051–16061 10.1021/acs.jmedchem.3c01471 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C66] 66.Woolfson, D.N. (2021) A brief history of de novo protein design: minimal, rational, and computational. J. Mol. Biol. 433, 167160 10.1016/j.jmb.2021.167160 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C67] 67. NobelPrize.org . The nobel prize in chemistry 2018. https://www.nobelprize.org/prizes/chemistry/2018/summary/

[etls-ETLS20240005C68] 68.Notin, P., Rollins, N., Gal, Y., Sander, C. and Marks, D. (2024) Machine learning for functional protein design. Nat. Biotechnol. 42, 216–228 10.1038/s41587-024-02127-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C69] 69.Callaway, E. (2024) Chemistry nobel goes to developers of alphafold AI that predicts protein structures. Nature New Biol. 634, 525–526 10.1038/d41586-024-03214-7 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C70] 70.Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A.et al. (2024) Accurate structure prediction of biomolecular interactions with alphafold 3. Nature 630, 493–500 10.1038/s41586-024-07487-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C71] 71.Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W.et al. (2023) Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 10.1126/science.ade2574 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C72] 72.Berman, H., Henrick, K., Nakamura, H. and Markley, J.L. (2007) The worldwide protein data bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 35, D301–3 10.1093/nar/gkl971 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C73] 73.Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.et al. (2017) Attention is all you need. Adv. Neural Inf. Process. Syst. 2017-December, 5999–6009 [Google Scholar]

[etls-ETLS20240005C74] 74.Watson, J.L., Juergens, D., Bennett, N.R., Trippe, B.L., Yim, J., Eisenach, H.E.et al. (2023) De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 10.1038/s41586-023-06415-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C75] 75.Zambaldi, V., La, D., Chu, A.E., Patani, H., Danson, A.E., Kwan, T.O.et al. (2024) De novo design of high-affinity protein binders with alphaproteo [arXiv:2409.08022]. arXiv. 10.48550/arXiv.2409.08022 [DOI]

[etls-ETLS20240005C76] 76.Shanehsazzadeh, A., McPartlon, M., Kasun, G., Steiger, A.K., Sutton, J.M., Yassine, E.et al. Unlocking de novo antibody design with generative artificial intelligence. 2023, 2023–01 bioRxiv. 10.1101/2023.01.08.523187 [DOI] [Google Scholar]

[etls-ETLS20240005C77] 77.M Bran, A., Cox, S., Schilter, O., Baldassari, C., White, A.D. and Schwaller, P. (2024) Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 10.1038/s42256-024-00832-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C78] 78.Boiko, D.A., MacKnight, R., Kline, B. and Gomes, G. (2023) Autonomous chemical research with large language models. Nature 624, 570–578 10.1038/s41586-023-06792-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C79] 79.Burley, S.K., Bhikadiya, C., Bi, C., Bittrich, S., Chao, H., Chen, L.et al. (2023) RCSB protein data bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res. 51, D488–D508 10.1093/nar/gkac1077 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C80] 80.Durant, G., Boyles, F., Birchall, K. and Deane, C.M. (2024) The future of machine learning for small-molecule drug discovery will be driven by data. Nat. Comput. Sci. 4, 1–9 10.1038/s43588-024-00699-0 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C81] 81.Buttenschoen, M., Morris, G.M. and Deane, C.M. (2024) Posebusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chem. Sci. 15, 3130–3139 10.1039/d3sc04185a [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C82] 82.Crusius, D., Cipcigan, F. and Biggin, P.C. (2025) Are we fitting data or noise? Analysing the predictive power of commonly used datasets in drug-, materials-, and molecular-discovery. Faraday Discuss. 256, 304–321 10.1039/d4fd00091a [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C83] 83.Chakravarty, D., Schafer, J.W., Chen, E.A., Thole, J.F., Ronish, L.A., Lee, M.et al. (2024) AlphaFold predictions of fold-switched conformations are driven by structure memorization. Nat. Commun. 15, 7296 10.1038/s41467-024-51801-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C84] 84.Coveney, P.V. and Highfield, R. (2024) Artificial intelligence must be made more scientific. J. Chem. Inf. Model. 64, 5739–5741 10.1021/acs.jcim.4c01091 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C85] 85.Birhane, A., Kasirzadeh, A., Leslie, D. and Wachter, S. (2023) Science in the age of large language models. Nat. Rev. Phys. 5, 277–280 10.1038/s42254-023-00581-4 [DOI] [Google Scholar]

[etls-ETLS20240005C86] 86.Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. and Moult, J. (2023) Critical assessment of methods of protein structure prediction (CASP)-round XV. Proteins 91, 1539–1549 10.1002/prot.26617 [DOI] [PMC free article] [PubMed] [Google Scholar]

[etls-ETLS20240005C87] 87.Wognum, C., Ash, J.R., Aldeghi, M., Rodríguez-Pérez, R., Fang, C., Cheng, A.C.et al. (2024) A call for an industry-led initiative to critically assess machine learning for real-world drug discovery. Nat. Mach. Intell. 6, 1120–1121 10.1038/s42256-024-00911-w [DOI] [Google Scholar]

[etls-ETLS20240005C88] 88.Rosemann, M., Brocke, J. vom, Van Looy, A. and Santoro, F. (2024) Business process management in the age of AI – three essential drifts. Inf. Syst. E-Bus. Manage. 22, 1–15 10.1007/s10257-024-00689-9 [DOI] [Google Scholar]

[etls-ETLS20240005C89] 89.Green, DVS. (2019) Using machine learning to inform decisions in drug discovery: an industry perspective. In Machine Learning in Chemistry. ACS Symposium Series (Pyzer-Knapp, E., ed), pp. 81–101. 10.1021/bk-2019-1326.ch005 [DOI] [Google Scholar]

[etls-ETLS20240005C90] 90.Oprea, T.I. and Weininger, D. (2024) Rethinking medicinal chemistry in the cheminformatics age. J. Med. Chem. 67, 17935–17939 10.1021/acs.jmedchem.4c02179 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C91] 91. McKinsey & Company . (2024) Moving past gen AI’s honeymoon phase: seven hard truths for CIOs to get from pilot to scale. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/moving-past-gen-ais-honeymoon-phase-seven-hard-truths-for-cios-to-get-from-pilot-to-scale

[etls-ETLS20240005C92] 92.Lucas, H.C. and Goh, J.M. (2009) Disruptive technology: how kodak missed the digital photography revolution. The Journal of Strategic Information Systems 18, 46–55 10.1016/j.jsis.2009.01.002 [DOI] [Google Scholar]

[etls-ETLS20240005C93] 93.Frasnetti, E., Cucchi, I., Pavoni, S., Frigerio, F., Cinquini, F., Serapian, S.A.et al. (2024) Integrating molecular dynamics and machine learning algorithms to predict the functional profile of kinase ligands. J. Chem. Theory Comput. 20, 9209–9229 10.1021/acs.jctc.4c01097 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C94] 94.Zenil, H., Tegnér, J., Abrahão, F.S., Lavin, A., Kumar, V., Frey, J.G.et al. The future of fundamental science led by generative closed-loop artificial intelligence [preprint arXiv:2307.07522. 2023]. arXiv. 10.48550/arXiv.2307.07522 [DOI]

[etls-ETLS20240005C95] 95.Saikin, S.K., Kreisbeck, C., Sheberla, D., Becker, J.S. and Aspuru-Guzik, A.. (2019) Closed-loop discovery platform integration is needed for artificial intelligence to make an impact in drug discovery. Expert Opin. Drug Discov. 14, 1–4 10.1080/17460441.2019.1546690 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C96] 96.Coley, C.W., Eyke, N.S. and Jensen, K.F. (2020) Autonomous discovery in the chemical sciences part I: progress. Angew. Chem. Int. Ed. Engl. 59, 22858–22893 10.1002/anie.201909987 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C97] 97.Coley, C.W., Eyke, N.S. and Jensen, K.F. (2020) Autonomous discovery in the chemical sciences part II: outlook. Angew. Chem. Int. Ed. 59, 23414–23436 10.1002/anie.201909989 [DOI] [PubMed] [Google Scholar]

[etls-ETLS20240005C98] 98.Sparkes, A., Aubrey, W., Byrne, E., Clare, A., Khan, M.N., Liakata, M.et al. (2010) Towards robot scientists for autonomous scientific discovery. Autom. Exp. 2, 1–1 10.1186/1759-4499-2-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

AI in drug design: evolution or revolution?

Darren V S Green