Emerging Frontiers in Virtual Drug Discovery: From Quantum Mechanical Methods to Deep Learning Approaches

Christoph Gorgulla; Abhilash Jayaraj; Konstantin Fackeldey; Haribabu Arthanari

doi:10.1016/j.cbpa.2022.102156

. Author manuscript; available in PMC: 2023 Aug 1.

Published in final edited form as: Curr Opin Chem Biol. 2022 May 13;69:102156. doi: 10.1016/j.cbpa.2022.102156

Emerging Frontiers in Virtual Drug Discovery: From Quantum Mechanical Methods to Deep Learning Approaches

Christoph Gorgulla ^a,^b,^c, Abhilash Jayaraj ^f, Konstantin Fackeldey ^d,^e, Haribabu Arthanari ^a,^c

PMCID: PMC9990419 NIHMSID: NIHMS1875512 PMID: 35576813

Abstract

Virtual screening-based approaches to discover initial hit and lead compounds have the potential to reduce both the cost and time of early drug discovery stages, as well as find inhibitors for even challenging target sites such as protein-protein interfaces. Here in this review, we provide an overview of the progress that has been made in virtual screening methodology and technology on multiple fronts in recent years. The advent of ultra-large virtual screens, in which hundreds of millions to billions of compounds are screened, has proven to be a powerful approach to discover highly potent hit compounds. However, these developments are just the tip of the iceberg, with new technologies and methods emerging to propel the field forward. Examples include novel machine-learning approaches, which can reduce the computational costs of virtual screening dramatically, while progress in quantum-mechanical approaches increases the accuracy of predictions of various small molecule properties.

Keywords: structure-based virtual screens, quantum chemistry, ultra-large virtual screens, machine learning, molecular docking, ligand preparation

PACS: 0000, 1111

2000 MSC: 65Y05, 68W10

1. Introduction

The idea of in silico screening, also referred to as virtual screening, is to use a computational approach to predict the ability of a small molecule to engage a biological target molecule. It has been estimated that the total chemical space of small molecules suitable for drug discovery contains over 10⁶⁰ possible compounds [7]. Traditional high-throughput screens are often limited to tenths of thousands to hundreds of thousands of compounds, with the higher end of this spectrum typically only feasible in big pharmaceutical settings. Virtual screens on the other hand are currently capable of sampling billion small molecules. The output of a virtual screening routine is a prediction of the binding affinities between each small molecule and the target biomolecule, typically a protein. The process of predicting the binding pose and related binding affinity of a single molecule for the target, also known as a receptor, is referred to as molecular docking. A docking routine can be divided into two independent parts: The first part is the search of the conformational space based on the degrees of freedom of the receptor and the ligand. In order to find suitable spatial arrangements of a ligand relative to the given target, the search space can be as restrictive as a single site on the protein of interest (e.g., the active site of an enzyme) or as expansive as the entire surface of the receptor (so-called ”blind docking”). This first stage is often also referred to as ”pose generation”. The second part consists of a scoring method, which assigns to each conformation a value correlating with the binding affinity of the ligand to the receptor. If the binding affinity is high, then the ligand is a good candidate for further investigation as a potential hit/lead compound. In the last couple of decades, a number of methods have been developed to address the conformational space sampling and the scoring function of the screen, as well as the exploration of the chemical space. For instance novel algorithms were adapted from the computer science field to effectively sample the conformational space, including algorithms inspired by natural phenomena such as the grey wolf and ant colony optimization routines [23, 22, 81, 40]. The scoring functions themselves [8] can be broadly divided into three different categories: force-field-based scoring functions (e.g., [34, 76]), empirical scoring functions, [10, 45, 54]), and knowledge-based scoring functions (e.g., [75]).

In current implementations of docking routines there is often a compromise between computational rigor and speed. This, combined with the inherent inaccuracies in the docking methods themselves, can lead to false positives in the virtual screen. One way to address this problem is by making the docking method more accurate. One promising approach for improving accuracy of docking is through the use of quantum mechanics(QM)-based methods. QM-based routines promise more accurate results, but are computationally demanding. An orthogonal approach taken by other screens makes use of machine learning(ML)-based methods, which have the potential to be computationally more efficient. However, both of these approaches are only useful if the chemical space being explored contains suitable small molecules capable of binding tightly to the receptor. Recent studies have indicated that ultra-large virtual screens, which explore a vast space of possible candidates, not only identify highly potent hit and lead compounds, but also have a reduced false positive rate due to increased error tolerance in the top scoring virtual hits. In this review, we summarize recent progress regarding ultra-large virtual screens and look forward to possible future contributions to the field from ML-based screening approaches and quantum chemistry-based methods.

2. Ultra-large physics-based virtual screening

Ultra-large virtual screens are screens in which 100 million or more ligands are screened against a biological macromolecule (also referred to here as a ”receptor”). Previously, ”standard-sized” virtual screens typically covered a few million compounds at most. Figure 1 illustrates the advantage that an ultra-large screen provides over that which uses a standard-sized ligand library. The smaller scope of a standard-sized virtual screen means that the probability of identifying a true hit is lower than that in an ultra-large screen. Another important aspect of virtual screening is its multi-scale character: Approximate computations of binding affinity require less computational power, therefore allowing more ligands, and thus a larger proportion of the chemical space, to be covered by a given set of resources. In contrast, very accurate calculations of binding affinity are computationally demanding and therefore substantially fewer calculations can be performed in the same amount of time.

Figure 1: — The large blue ellipse represents the entire chemical space of small drug-like molecules while the stars represent individual molecules. Blue stars represent molecules that bind weakly or not at all to a specific target protein. The yellow stars represent molecules that bind strongly to the target protein. The small and large black, framed ellipses represent the scope of standard and ultra-large virtual screens, respectively, illustrating that the chance of finding exceptionally strong binding molecules in the screen increases with scale. Ultra-large virtual screens can be carried out by classical physics-based docking programs, as well as AI-based screening methods. Quantum mechanics-based algorithms as well as AI-based approaches can then be used to further filter the initial hits by re-scoring them or by predicting important properties, *e.g.*, those related to ADMET characteristics.

Ultra-large virtual screening (ULVS) became feasible due to four recent developments: 1) tangible ultra-large commercial libraries, 2) access to powerful high-performance computing clusters through cloud-based platforms, 3) ready availability of high-resolution structures or models, and 4) virtual screening software platforms that enable the screening of ultra-large ligand libraries.

2.1. Ligand libraries

Ultra-large virtual screening libraries of commercially available compounds became accessible for the first time around 2015 with the creation of the ZINC15 library [68]. While it initially contained approximately 120 million compounds, the ZINC15 library has continued to grow in size and number of features, and today contains the compound catalogs of hundreds of vendors. At the time of writing the ZINC15 database contained approximately 750 million compounds in ready-to-dock format, and 1.49 billion compounds in total (i.e., around 50% of the compounds have been prepared into a ready-to-dock format). In 2020, the next generation of the library was released, and the ZINC15 database became the ZINC20 database, a library that is substantially larger than the original ZINC15 library [33]. The majority of the compounds in this database are classed as ”on-demand” compounds, which means that they are not immediately available for shipping, but are synthesized by the compound vendors upon request. To date there are three ultra-large on-demand libraries available: the REAL library from Enamine[28], the CHEMriya library from Otava¹, and the make-on-demand library from WuXi². These on-demand libraries are designed by each company based on their available building blocks. Since the chemistry using these building blocks is established at these companies, high success rates are obtained in the on-demand synthesis of these ultra large libraries. However, since these libraries are built from a pre-defined set of building blocks, the diversity of the resulting ultra-large libraries has been questioned. This concern was looked at directly in [33, 72], which showed that the modern on-demand chemical libraries have a favorable inherent diversity that originates from the unique nature of the building blocks. Often the molecules in these ultra-large libraries are enriched for drug-like characteristics; for instance the library often conforms to Lipinski’s rule of 5 [46]. However the ”flatness” of these ultra-large libraries is still partially an open question. Attention to the degree of saturated carbons, quantified by the fraction of sp3 hybridized carbons and the number of chiral carbons, can guide the curation of libraries with diverse three dimensional shapes [48].

2.2. High-performance computing infrastructure

High-performance computing (HPC) infrastructure has become widely available in the last two decades to researchers in biomedical fields. Initially, university computer clusters and national supercomputing centers provided most of the high-performance computational power. Between 2005 and 2010, the first large-scale cloud computing infrastructure became available with Amazon Web Services (AWS). Since then, other major cloud providers have entered the field, such as Google with Google Cloud, Microsoft with Azure, and IBM with IBM Cloud. These cloud environments can provide virtual machines and millions of processors at any given time, and can be setup to mimic classical Linux clusters such as a SLURM cluster. An up-to-date list of the most powerful computer systems available can be found in the TOP500 list³. Processes that scale with the number of compute nodes, such as molecular docking, are particularly well positioned to take advantage of this new HPC accessibility.

Another way in which improvements in computing infrastructure can aid virtual screening is through the modeling of protein dynamics. Appropriate consideration of the target biomolecule’s conformational dynamics during docking is important for accurate evaluation of hit binding. Advances in computational resources, especially improvements in GPU hardware and dedicated HPC designs like the Anton supercomputer [65], have enabled access to long molecular dynamics trajectories (μs to ms) that help capture the dynamics of target proteins.

2.3. Protein structures

For structure-based virtual screens, high-resolution structural information is a necessary starting point. Historically, atomic or near-atomic resolution protein-structures were obtained using X-ray crystallography or NMR spectroscopy. However, the resolution revolution in the field of cryogenic electron microscopy (cryo-EM) has meant that cryo-EM has also emerged as a powerful and accessible method capable of providing high-resolution structural information of large molecular weight systems, including multi-protein complexes and membrane proteins in their lipid environments [41, 70]. While the resolution of cryo-EM derived structures is constantly improving to match those of crystal structures; care should still be taken when using a cryo-EM-based model to consider the local resolution of the original EM map at the docking site.

At the time of writing, experimentally derived structures are available for only approximately 35% of human proteins[74], and often only a part of the protein is covered by the available structure. In the past, homology modeling was a promising option for cases in which the structure of a protein of similar sequence was available. The advent of AlphaFold 2 [35] and RosettaFold [4] in 2021 brought about a seismic shift to the field of de novo protein structure determination. These methods use ultra-deep neural networks and have been shown to be able to predict the structure of proteins with experimental accuracy. AlphaFold 2 was used to predict the structures of almost the entire human proteome and these predictions have been made publicly available in the AlphaFold Structure Database [74]. This new abundance of high-quality protein structures is of immense value for structure-based virtual screening approaches. Another important aspect to consider in the docking process is the dynamics in the protein. Here again advances in computational resources, especially improvements in GPU hardware and dedicated HPC designs like the Anton supercomputer have enabled access to long molecular dynamics trajectories (μs to ms) which help capture the dynamics[65]. This enables selection of biochemically relevant and long lived conformations of biomolecules using techniques like clustering and Principal Component Analysis (PCA) to be utilized in structure based virtual screening [16, 15, 80]. This in turn has shown to lead to a selection of more potent hits during the screening process[47]. Molecular dynamics also enables post docking evaluation of selected hits by helping estimate binding affinity and study of effect of solvation and structural dynamics on protein-ligand complexes [1, 61]. Clustering algorithms are known to be compute intensive having a complexity of O(N²), where N represents the number of frames in the MD trajectory[64]. With MD trajectories regularly representing microsecond to millisecond long time scales and having millions of MD frames, this represents significant time and memory investment. To circumvent these high demands, clustering can be utilized in conjugation with PCA to guide selection of biologically significant clusters[79, 53]. Additionally, recent advances have described novel implementations of clustering algorithms with significantly lower space and time complexities[55, 47]. These clustering algorithm implementations have been shown to be 2-3 times more efficient compared to their predecessors.

2.4. Early days in ultra-large virtual screens

In 2018, one of the first ultra-large virtual screens were reported for multiple protein targets including EBP1 and MED15-KIX domain, with up to 300 million ligands screened per target (Christoph Gorgulla, PhD thesis, Freie Universität Berlin, 2018). A year later, the first peer-reviewed ultra-large virtual screens, against two G-protein-coupled receptors (GPCRs), by Lyu et al. [49] were published and inclued extensive experimental validation. 99 million ligands were screened against the first target, the D4 dopamine receptor, leading to 30 experimentally confirmed initial hits with submicromolar binding affinity, including one compound with a binding affinity of 180 pM. The screen against the second target protein, the AmpC β-lactamase (AmpC), resulted in a compound with a K_i of 77 nM after optimization, one of the most potent non-covalent inhibitors of AmpC discovered to date. This study also experimentally demonstrated for the first time that the true hit rate within the top scoring compounds increases with the scale of the screen. In 2020, the first ultra-large virtual screen with over 1 billion compounds was reported, making use of a new, open-source platform called VirtualFlow [21]. A second ultra-large virtual screen was also published that year that screened for initial hits against the melatonin receptor. Optimized hits from this second screen were then shown to be able to modulate the circadian rhythm of mice in animal studies [67]. Just last year, in one of the largest screening campaigns against a disease, 40 functional sites in 17 proteins critical for the viability of SARS-CoV-2 were targeted with 1.1 billion molecules each, resulting in approximately 50 billion docking instances [24]. A recent publication summarizes the top ultra-large virtual screens with detailed tabulation that includes their corresponding true-hit rates [5]. These screens and the data they generate, including experimental validation of resulting hits, highlight the advantages of ultra-large virtual screens and their ability to compensate for inherent limitations in the docking routines, leading to the identification of potent high-affinity binders.

2.5. Software for ultra-large virtual screens

Although there were already several established docking programs, the first dedicated open-source platform for ultra-large virtual screens, VirtualFlow, was published in 2020 [21]. VirtualFlow is able to routinely and automatically screen billions of small molecules against the structure of any given biological macromolecule. The VirtualFlow paper [21] also demonstrated mathematically what had been experimentally suggested by Lyu et al. [49]: why the true hit rate increases with the scale of the screen.

VirtualFlow consists of two independent modules: VFLP (VirtualFlow for Ligand Preparation) and VFVS (VirtualFlow for Virtual Screening). VFLP is able to convert ultra-large ligand libraries from the SMILES format to another target format, including PDB, PDBQT, or MOL2. VFVS was developed to support a number of docking programs, such as AutoDock Vina and Smina Vinardo [73, 38], and the open-source nature of VirtualFlow facilitates the inclusion of other docking routines. VFVS enables execution of both ensemble docking and consensus docking, and allows the inclusion of protein flexibility in the backbone and side chains. As part of initial testing of VirtualFlow, a commercially available ligand library from Enamine of over 1.4 billion compounds was prepared in a ready-to-dock format and made publicly available. Later the ZINC15 library was added to this collection. VirtualFlow was used to screen 1.3 billion compounds against the protein KEAP1 at its binding interface with NRF2 [21]. This was the first protein-protein interaction targeted with an ultra-large virtual screen, and it resulted in experimentally confirmed compounds with low-nanomolar binding affinity and submicromolar IC50 values for displacement of an NRF2 peptide.

Two of the ultra-large virtual screens mentioned in Section 2.4 made use of a different program called DOCK [49, 67]. DOCK is a highly-configurable docking program with many advanced features. Among these features are multiple GB/SA scoring functions, a PB/SA scoring function, force field-based scoring functions, receptor flexibility scoring, macromolecular docking, flexible as well rigid protein docking, and pharmacophore-based scoring. In 2021, a protocol was published detailing how DOCK3.7 can be used to carry out ultra-large virtual screening [5].

On another front, AutoDock 4, a commonly used docking program has been upgraded in order to harness the calculating power and cost-efficiency of recent models of GPU. AutoDock 4 has been extended to become AutoDock-GPU [63]. When AutoDock-GPU was benchmarked, it exhibited a 350-fold speedup on a current modern GPU relative to a single modern CPU core. This new docking program can now be used in ultra-large virtual screens, which was demonstrated recently when over 1 billion compounds were screened against a SARS-CoV-2 target protein [43, 1]. AutoDock-GPU has also been incorporated into a new inverse virtual screening platform called AMIDE v2 (AutoMated Inverse Docking Engine) [14], in which multiple ligands can be docked against multiple proteins.

One of the most recent methods related to ultra-large screenings is the V-SYNTHES approach [62]. This approach leverages the fact that ultra-large libraries are assembled by combining a much smaller number of building blocks, referred to as fragments. Here, at first the fragments from which the ultra-large REAL Space is derived, are docked to the target protein. Then the top-scoring fragments are selected, taking diversity into account, and assembled into full molecules that in turn are evaluated again. This approach allows to greatly reduce the computational effort required to screen ultra-large libraries, by effectively reducing the number of ligands which need to be docked.

2.6. Challenging target classes

Ultra-large virtual screens have the ability to find exceptionally tight binding molecules due to the large chemical spaces that are being screened. This enables the screening of target classes that were traditionally considered challenging or even undruggable, such as shallow protein-protein interfaces, allosteric sites, and blind docking against entire protein surfaces, which can be useful for PROTAC development [51].

3. Ligand-based virtual screening

While structure based virtual screening (SBVS) remains the preferred choice for most drug design endeavors, novel and potent small molecule hits can also be screened in absence of a macromolecular receptor structure. Ligand based virtual screening (LBVS) explores ligand libraries based on properties of ligands known to bind the target biomolecule [44, 3, 86, 27]. LBVS based methods can be classified based on type of descriptors they employ for searching the chemical libraries. 1D based methods employ 1D descriptors like molecular volume or hydrogen bond doner/acceptor counts. These methods are more beneficial towards enrichment of chemical space rather than predict potent hits. 2D based methods scrutinize databases using information derived from 2D molecular fingerprints [59, 11]. These fingerprints can be compared using similarity metrics like Tanimoto coefficient. 3D based methods utilize pharmacophore [39] and shape-based [31, 25, 26] screening methods. These methods are slower but shown to achieve higher accuracies owing to incorporation of 3D structural information. LBVS methodologies also serve as a great tool for scaffold hopping, which enables screening of novel geometry hit compounds while retaining high potency towards target biomolecule [69]. Recent developments have shown that LBVS can considerably leverage machine learning and artificial intelligence to achieve higher accuracies [3, 69]. Combined screens which involve both SBVS and LBVS in conjunction with artificial intelligence methods have shown promising results by successfully scanning billion molecule libraries with 100-fold acceleration compared to only SBVS [18].

4. Machine learning (ML) approaches

4.1. Virtual screenings

ML-based approaches to dock small molecules against protein structures are computationally more efficient than physics-based docking methods by orders of magnitude. However, a main challenge of these approaches has been to achieve equivalent or better accuracy than state-of-the-art physics-based docking programs. Progress has been made in this regard by studies from multiple research groups. A novel drug discovery platform named ”DeepDock” was used to carry out one the deep learning-based ultra-large virtual screens. The platform uses quantitative structure-activity relationship (QSAR), which are learned by deep models trained on docking scores generated by traditional docking methods of a moderate number of compounds of the chemical libraries to be screened [19]. Another approach taken by Gupta and Zhou has been to trim down the initial screening library by a factor of ten using Dense Neural Networks [29]. The authors of this study demonstrated that application of this technique can decrease the search space without losing potential hits in the search database. ML-based methods can also enhance large-scale virtual screens by classifying virtual hits as true and false positives[2].

4.2. ADMET predictions

Typical virtual screens provide molecules that have the potential to bind to the target macromolecule; however, success in transitioning from the hit stage to established activity in animal models requires additional pharmacological properties inducing biosafety and bioavailability [37, 30, 17, 52, 71, 13, 83, 42, 50, 82, 77]. ML-based methods have also found application in the prediction of ADMET properties (Absorption, Distribution, Metabolism, Excretion, and Toxicity) for small molecules [30, 17, 52, 71, 57]. This is especially useful as structure-based virtual screens do not explicitly account for off-target binding effects [13]. With access to structures structrual models of the large portion of the human proteome an inverse-virtual screen can be performed with the top hits. On the other hand ML-based approaches have an inherent advantage in these types of large multifactorial analyses. Models can be trained on millions of data points to correctly predict outcomes. Already, ML-based approaches have been developed to assist in prediction of toxicity in silico [82, 77]. As toxicity prediction involves analyzing highly interconnected parameters including properties and expression of various biomolecules, which in turn are dependent on factors such as cell type, age, gender, and genome polymorphisms, these methodologies are still in the nascent stages of development. However, the hope is that someday, ML-based algorithms will help reduce the risk of hits failing at later organism-level testing.

5. Quantum chemistry approaches

Although ultra-large virtual screens do increase the true-hit rate, the average true-hit rate is around 10-20%, with the best case scenarios around 40% [5]. In the realm of ultra-large screens, thousands of candidates at the top of the list have the potential to be potent binders. For reasons of feasibility, the number of candidates chosen to proceed to experimental validation is best limited to hundreds of molecules, but with a low true-hit rate this limitation can lead to valuable molecules being overlooked. Could the true-hit rate be further improved? On an atomic level, classical physics is only a coarse approximation of nature as quantum mechanical effects begin to play a dominant role on this scale. While quantum effects occur in every atom, they are particularly strong in metal ions, whose large electron cloud can for instance experience strong polarization effects caused by the atomic environment. Metal ions are very common in proteins. It is estimated that over 25% of all proteins are metalloproteins [78]. Traditional physics-based docking methods, however, are mostly based on classical physics. Furthermore, they often only approximate the classical laws of physics. These aspects are among the primary reasons why classical docking methods are not very accurate. The inaccuracy in turn leads to high false-positive rates in virtual screens.

Considerable work has been carried out to bring quantum chemistry approaches to structure-based drug discovery to address the inaccuracies in traditional docking methods. While it is still rare to see the application of quantum mechanics to molecular docking and virtual screening, the ever increasing available computing power and the continued improvement of quantum chemical algorithms has led to an increased number of applications in recent years. A book on the the application of quantum mechanics in drug discovery was published in 2020, containing a number of methods, protocols, and applications of quantum mechanics in drug discovery [32]. With quantum computers on the horizon, it is conceivable that the computationally demanding quantum chemistry calculations ultimately needed could be become more feasible in the future.

5.1. Quantum mechanics(QM)-based docking and virtual screening

QM methods are designed to provide a more detailed description of molecular interaction than classical methods. Consequently, in the last few years, a number of QM-based scoring function have been developed (e.g., [58, 84, 12]). The high computational costs of quantum chemical methods has been a key obstacle in the application of such methods to molecular docking and virtual screening. Recently, attempts have been made to use the high computational power of GPUs to facilitate such applications. In [66], the quantum chemistry software package teraChem, which is GPU-accelerated, was used to carry out a quantum-based virtual screen of 34 natural compounds. The ligands were docked against the New Delhi Metalloprotein (NDM-1), and a quantum-mechanics/molecular mechanics (QM/MM) approach was used to refine the docking poses and computed energies. Another QM-based high-throughput docking method for virtual screening called QMDS was published in [12]. When applied to ten protein-systems, the study observed remarkably high enrichment factors for the majority of the 10 proteins.

Classical machine learning to predict quantum mechanical properties of molecules to speed up the calculations have also made remarkable progress, such as message passing neural networks applied to quantum chemical problems based on density functional theory [20].

5.2. QM methods for ADMET predictions

QM methods have also been increasingly used to predict ADMET properties of small molecules. ADMET-related properties are of particular interest in drug discovery, as they help parameterize the chances that a small molecule will pass the preclinical and clinical stages of the drug development pipeline. Quantum-based methods can be of critical importance as quantum-mechanical effects play a key role in many of the ADMET-related properties. For example, QM methods can help predict possible metabolic reactions of the small molecules. They can also be used to predict how well small molecules are able to penetrate the cell wall or the blood-brain barrier as these properties are dependent on electronic structure effects. A recent review expands on the application of QM methods in the evaluation of ADMET properties [56].

6. Conclusions and outlook

Virtual screening approaches have been used in drug discovery for the last 20 years. The continuous improvement of virtual screening methodology coupled with access to high-performance computing infrastructure has led to some remarkable successes. Ultra-large virtual screens in particular have proven to be promising alternatives to traditional high-throughput screens, often leading to highly potent hit compounds with comparatively small time and cost investments. ML-based approaches, which have already started to contribute to the field, have an enormous potential in the future of drug discovery and virtual screening-based approaches, and major breakthroughs in this area can be expected in the next few years. Quantum-mechanical approaches, while still in a nascent stage, are also likely to play an increased role in the future along with concomitant developments in computing power. Future progress in quantum computing in particular will allow quantum-mechanical methods to come into their own and bring computational drug discovery to the next level. The future promises bigger and better chemical libraries, faster and more powerful computational resources, a wealth of data to train AI/ML models, better and more rigorous methods to compute binding free energies, high-resolution structural information with biological insights, and algorithms to de-risk the hit candidates for toxicity and enrich bio-availability. While incredible strides have already been made in the field of ultra-large virtual screens, these emerging methods and resources promise to bring about a paradigm shift to drug discovery as we know it.

Table 1: Ligand libraries for ultra-large virtual screens.

An overview of currently available ligand libraries of ultra-large size (i.e., over 100 million ligands). Some of these libraries provide their ligands in the SMILES format, and others (e.g., the ZINC databases and the VirtualFlow versions of the ZINC and REAL databases) provide their ligands in a ready-to-dock format.

Ligand Library	Total number of compounds (billions)	Compounds in ready-to-dock format (billions)	References
ZINC20	1.7	0.7	[33]
ZINC15 (VirtualFlow version, 2018)	1.5	1.5	https://virtual-flow.org/virtualflow-version-zinc15-library
Enamine REAL Database (2022)	4.1	N/A	https://enamine.net/compound-collections/real-compounds/real-database
Enamine REAL Database (VirtualFlow version, 2018)	1.4	1.4	[85], https://virtual-flow.org/real-library
Enamine REAL Space	21	N/A	[28]
Otava’s CHEMriya	12	N/A	https://www.otavachemicals.com/products/chemriya
WuXi’s GalaXi Space	2.3	N/A	https://www.wuxibiology.com/hit-finding-with-wuxi-apptec
GDB-13	1	N/A	[6]
GDB-17	166	N/A	[60]
GDBChEMBL	26	N/A	[9]
PubChem	0.1	N/A	[36]

Open in a new tab

Acknowledgements

C.G. was supported by NIH grants CA200913, AI037581 and GM129026 to Gerhard Wagner and ARO Grant W911NF1910302 to Arthur Jaffe. H.A. was supported funding from NIH (GM136859) and the Claudia Adams Barr Program for Innovative Cancer Research. A.J. was supported by funding from NIH (GM129715) awarded to David Beveridge. K.F. would like to thank the Math+ Berlin Mathematics Research Center.

Footnotes

https://otavachemicals.com/products/chemriya

https://www.wuxibiology.com/hit-finding-with-wuxi-apptec

https://www.top500.org/

References

* of special interest

** of outstanding interest

[1].Acharya A, Agarwal R, Baker MB, Baudry J, Bhowmik D, Boehm S, Byler KG, Chen SY, Coates L, Cooper CJ, Demerdash O, Daidone I, Eblen JD, Ellingson S, Forli S, Glaser J, Gumbart JC, Gunnels J, Hernandez O, Irle S, Kneller DW, Kovalevsky A, Larkin J, Lawrence TJ, LeGrand S, Liu S-H, Mitchell JC, Park G, Parks JM, Pavlova A, Petridis L, Poole D, Pouchard L, Ramanathan A, Rogers DM, Santos-Martins D, Scheinberg A, Sedova A, Shen Y, Smith JC, Smith MD, Soto C, Tsaris A, Thavappiragasam M, Tillack AF, Vermaas JV, Vuong VQ, Yin J, Yoo S, Zahran M, and Zanetti-Polzi L. Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19. Journal of Chemical Information and Modeling, 60(12):5832–5852, dec 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Adeshina Yusuf O, Deeds Eric J, and Karanicolas John. Machine learning classification can reduce false positives in structure-based virtual screening. Proceedings of the National Academy of Sciences, 117(31):18477–18488, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Amendola Giorgio and Cosconati Sandro. Pyrmd: A new fully automated ai-powered ligand-based virtual screening tool. Journal of Chemical Information and Modeling, 61(8):3835–3845, 2021. [DOI] [PubMed] [Google Scholar]
[4].Baek Minkyung, DiMaio Frank, Anishchenko Ivan, Dauparas Justas, Ovchinnikov Sergey, Lee Gyu Rie, Wang Jue, Cong Qian, Kinch Lisa N, Schaeffer R Dustin, Millán Claudia, Park Hahnbeom, Adams Carson, Glassman Caleb R, DeGiovanni Andy, Pereira Jose H, Rodrigues Andria V, van Dijk Alberdina A., Ebrecht Ana C, Opperman Diederik J, Sagmeister Theo, Buhlheller Christoph, Pavkov-Keller Tea, Rathinaswamy Manoj K, Dalwadi Udit, Yip Calvin K, Burke John E, Garcia K Christopher, Grishin Nick V, Adams Paul D, Read Randy J, and Baker David. Accurate prediction of protein structures and interactions using a three-track neural network. Science, 8754(July):eabj8754, jul 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5] **. Bender Brian J., Gahbauer Stefan, Luttens Andreas, Lyu Jiankun, Webb Chase M., Stein Reed M., Fink Elissa A., Balius Trent E., Carlsson Jens, Irwin John J., and Shoichet Brian K.. A practical guide to large-scale docking. Nature Protocols, 2021. Elaborate protocol on how to carry out ultra-large virtual screenings using DOCK3.7.
[6].Blum Lorenz C and Reymond Jean-Louis. 970 million druglike small molecules for virtual screening in the chemical universe database gdb-13. Journal of the American Chemical Society, 131(25):8732–8733, 2009. [DOI] [PubMed] [Google Scholar]
[7].Bohacek Regine S., McMartin Colin, and Guida Wayne C.. The art and practice of structure-based drug design: A molecular modeling perspective. Medicinal Research Reviews, 16(1):3–50, 1996. [DOI] [PubMed] [Google Scholar]
[8].Brooijmans N and Kuntz ID. Molecular recognition and docking algorithms. Annu Rev Biophys Biomol Struct., 32:335–373, 2003. [DOI] [PubMed] [Google Scholar]
[9].Bühlmann Sven and Reymond Jean-Louis. Chembl-likeness score and database gdbchembl. Frontiers in Chemistry, page 46, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Böhm HJ. The computer program ludi: a new method for the de novo design of enzyme inhibitors. J Comput Aided Mol Des., 6(1):61–78, 1992. [DOI] [PubMed] [Google Scholar]
[11].Carhart Raymond E., Smith Dennis H., and Venkataraghavan R. Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Computer Sciences, 25(2):64–73, 1985. [Google Scholar]
[12].Cavasotto Claudio N. and Aucar M. Gabriela. High-throughput docking using quantum mechanical scoring. Frontiers in Chemistry, 8:246, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Clark Matthew and Steger-Hartmann Thomas. A big data approach to the concordance of the toxicity of pharmaceuticals in animals and humans. Regulatory Toxicology and Pharmacology, 96:94–105, 2018. [DOI] [PubMed] [Google Scholar]
[14].Darme Pierre, Dauchez Manuel, Renard Arnaud, Voutquenne-Nazabadioko Laurence, Aubert Dominique, Escotte-Binet Sandie, Renault Jean-Hugues, Villena Isabelle, Steffenel Luiz-Angelo, and Baud Stephanie. Amide v2: High-throughput screening based on autodock-gpu and improved workflow leading to better performance and reliability. International journal of molecular sciences, 22(14):7489, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].De Vivo Marco, Masetti Matteo, Bottegoni Giovanni, and Cavalli Andrea. Role of molecular dynamics and related methods in drug discovery. Journal of Medicinal Chemistry, 59(9):4035–4061, 2016. [DOI] [PubMed] [Google Scholar]
[16].Falcon Wilfredo Evangelista, Ellingson Sally R., Smith Jeremy C., and Baudry Jerome. Ensemble docking in drug discovery: How many protein configurations from molecular dynamics simulations are needed to reproduce known ligand binding? The Journal of Physical Chemistry B, 123(25):5189–5195, 2019. [DOI] [PubMed] [Google Scholar]
[17].Ferreira Leonardo L G and Andricopulo Adriano D. Admet modeling approaches in drug discovery. Drug discovery today, 24(5):1157–1165, May 2019. [DOI] [PubMed] [Google Scholar]
[18].Gentile F, Yaacoub JJC, Gleave, and et al. Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking. Nat Protoc, pages 1–26, 2022. [DOI] [PubMed] [Google Scholar]
[19].Gentile Francesco, Agrawal Vibudh, Hsing Michael, Ton Anh-Tien, Ban Fuqiang, Norinder Ulf, Gleave Martin E., and Cherkasov Artem. Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery. ACS Central Science, 6(6):939–949, jun 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Gilmer Justin, Schoenholz Samuel S., Riley Patrick F., Vinyals Oriol, and Dahl George E.. Neural message passing for quantum chemistry. CoRR, abs/1704.01212, 2017. [Google Scholar]
[21] **. Gorgulla Christoph, Boeszoermenyi Andras, Wang Zi-fu, Fischer Patrick D, Coote Paul W., Das Krishna M. Padmanabha, Malets Yehor S, Radchenko Dmytro S, Moroz Yurii S, Scott David A, Fackeldey Konstantin, Hoffmann Moritz, Iavniuk Iryna, Wagner Gerhard, and Arthanari Haribabu. An open-source drug discovery platform enables ultra-large virtual screens. Nature, 580(7805):663–668, apr 2020. The first virtual screening platform (VirtualFlow) able to routinely carry out ultra-large virtual screenings of billions of compounds.
[22].Gorgulla Christoph, Çınaroğlu Süleyman Selim, Fischer Patrick D, Fackeldey Konstantin, Wagner Gerhard, and Arthanari Haribabu. VirtualFlow Ants—Ultra-Large Virtual Screenings with Artificial Intelligence Driven Docking Algorithm Based on Ant Colony Optimization. International Journal of Molecular Sciences, 22(11):5807, may 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[23].Gorgulla Christoph, Fackeldey Konstantin, Wagner Gerhard, and Arthanari Haribabu. Accounting of Receptor Flexibility in Ultra-Large Virtual Screens with VirtualFlow Using a Grey Wolf Optimization Method. Supercomputing Frontiers and Innovations, 7(3):4–12, sep 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[24].Gorgulla Christoph, Das Krishna M. Padmanabha, Leigh Kendra E., Cespugli Marco, Fischer Patrick D., Wang Zi Fu, Tesseyre Guilhem, Pandita Shreya, Shnapir Alec, Calderaio Anthony, Gechev Minko, Rose Alexander, Lewis Noam, Hutcheson Colin, Yaffe Erez, Luxenburg Roni, Herce Henry D., Durmaz Vedat, Halazonetis Thanos D., Fackeldey Konstantin, Patten JJ, Chuprina Alexander, Dziuba Igor, Plekhova Alla, Moroz Yurii, Radchenko Dmytro, Tarkhanova Olga, Yavnyuk Irina, Gruber Christian, Yust Ryan, Payne Dave, Näär Anders M., Namchuk Mark N., Davey Robert A., Wagner Gerhard, Kinney Jamie, and Arthanari Haribabu. A multi-pronged approach targeting SARS-CoV-2 proteins using ultra-large virtual screening. iScience, 24(2):102021, feb 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[25].Grant JA, Gallardo MA, and Pickup BT. A fast method of molecular shape comparison: A simple application of a gaussian description of molecular shape. Journal of Computational Chemistry, 17(14):1653–1666, 1996. [Google Scholar]
[26].Grebner Christoph, Malmerberg Erik, Shewmaker Andrew, Batista Jose, Nicholls Anthony, and Sadowski Jens. Virtual Screening in the Cloud: How Big Is Big Enough? Journal of Chemical Information and Modeling, 2019. [DOI] [PubMed] [Google Scholar]
[27].Grimm Maximilian, Liu Yang, Yang Xiaocong, Bu Chunya, Xiao Zhixiong, and Cao Yang. Ligmate: A multifeature integration algorithm for ligand-similarity-based virtual screening. Journal of Chemical Information and Modeling, 60(12):6044–6053, 2020. [DOI] [PubMed] [Google Scholar]
[28].Grygorenko Oleksandr O., Radchenko Dmytro S., Dziuba Igor, Chuprina Alexander, Gubina Kateryna E., and Moroz Yurii S.. Generating Multibillion Chemical Space of Readily Accessible Screening Compounds. iScience, 23(11):101681, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[29].Gupta Aayush and Zhou Huan-Xiang. Machine learning-enabled pipeline for large-scale virtual drug screening. Journal of Chemical Information and Modeling, 61(9):4236–4244, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Göller Andreas H., Kuhnke Lara, Montanari Floriane, Bonin Anne, Schneckener Sebastian, Laak Antonius ter, Wichard Jörg, Lobell Mario, and Hillisch Alexander. Bayer’s in silico admet platform: a journey of machine learning over the past two decades. Drug Discovery Today, 25(9):1702–1709, 2020. [DOI] [PubMed] [Google Scholar]
[31].Hamza Adel, Wei Ning-Ning, and Zhan Chang-Guo. Ligand-based virtual screening approach using a new scoring function. Journal of Chemical Information and Modeling, 52(4):963–974, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32] **. Heifetz Alexander. Quantum mechanics in drug discovery. Springer, 2020. Comprehensive book on the current status of quantum mechanical methods in drug discovery.
[33] **. Irwin John J, Tang Khanh G, Young Jennifer, Dandarchuluun Chinzorig, Wong Benjamin R, Khurelbaatar Munkhzul, Moroz Yurii S, Mayfield John, and Sayle Roger A. ZINC20—A Free Ultralarge-Scale Chemical Database for Ligand Discovery. Journal of Chemical Information and Modeling, 60(12):6065–6073, dec 2020. ZINC20 is the successor of ZINC15, and is most widely used compound library for virtual screening. The ZINC library was the first available ultra-large virtual screening library in ready-to-dock format.
[34].Jones Gareth, Willett Peter, and Glen Robert C.. Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. Journal of Molecular Biology, 245(1):43–53, 1995. [DOI] [PubMed] [Google Scholar]
[35] *. Jumper John, Evans Richard, Pritzel Alexander, Green Tim, Figurnov Michael, Ronneberger Olaf, Tunyasuvunakool Kathryn, Bates Russ, Žídek Augustin, Potapenko Anna, Bridgland Alex, Meyer Clemens, Kohl Simon A A, Ballard Andrew J, Cowie Andrew, Romera-Paredes Bernardino, Nikolov Stanislav, Jain Rishub, Adler Jonas, Back Trevor, Petersen Stig, Reiman David, Clancy Ellen, Zielinski Michal, Steinegger Martin, Pacholska Michalina, Berghammer Tamas, Bodenstein Sebastian, Silver David, Vinyals Oriol, Senior Andrew W., Kavukcuoglu Koray, Kohli Pushmeet, and Hassabis Demis. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589, aug 2021. AlphaFold is the first de-novo protein structure prediction method reaching experimental reliability and accuracy.
[36].Kim Sunghwan, Thiessen Paul A, Bolton Evan E, Chen Jie, Fu Gang, Gindulyte Asta, Han Lianyi, He Jane, He Siqian, Shoe-maker Benjamin A, et al. Pubchem substance and compound databases. Nucleic acids research, 44(D1):D1202–D1213, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[37].Kirchmair J, Göller AH, Lang D, Kunze J, Testa B, Wilson ID, Glen R, and et al. Predicting drug metabolism: experiment and/or computation? Nat Rev Drug Discov, 14:387–404, 2015. [DOI] [PubMed] [Google Scholar]
[38].Koes David Ryan, Baumgartner Matthew P., and Camacho Carlos J.. Lessons Learned in Empirical Scoring with smina from the CSAR 2011 Benchmarking Exercise. Journal of Chemical Information and Modeling, 53(8):1893–1904, aug 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[39].Koes David Ryan and Camacho Carlos J.. ZINCPharmer: pharmacophore search of the ZINC database. Nucleic Acids Research, 40(W1):W409–W414, 05 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[40].Korb Oliver, Stützle Thomas, and Exner Thomas E.. PLANTS: Application of ant colony optimization to structure-based drug design. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4150 LNCS:247–258, 2006. [Google Scholar]
[41].Kühlbrandt Werner. The resolution revolution. Science, 343(6178):1443–1444, 2014. [DOI] [PubMed] [Google Scholar]
[42].Lagorce David, Bouslama Lina, Becot Jerome, Miteva Maria A, and Villoutreix Bruno O. FAF-Drugs4: free ADME-tox filtering computations for chemical biology and early stages drug discovery. Bioinformatics, 33(22):3658–3660, 07 2017. [DOI] [PubMed] [Google Scholar]
[43].LeGrand Scott, Scheinberg Aaron, Tillack Andreas F., Thavappiragasam Mathialakan, Vermaas Josh V., Agarwal Rupesh, Larkin Jeff, Poole Duncan, Santos-Martins Diogo, Solis-Vasquez Leonardo, Koch Andreas, Forli Stefano, Hernandez Oscar, Smith Jeremy C., and Sedova Ada. GPU-Accelerated Drug Discovery with Docking on the Summit Supercomputer. In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pages 1–10, New York, NY, USA, sep 2020. ACM. [Google Scholar]
[44].Lešnik Samo, Štular Tanja, Brus Boris, Knez Damijan, Gobec Stanislav, Janežič Dušanka, and Konc Janez. Lisica: A software for ligand-based virtual screening and its application for the discovery of butyrylcholinesterase inhibitors. Journal of Chemical Information and Modeling, 55(8):1521–1528, 2015. [DOI] [PubMed] [Google Scholar]
[45].Li Guo-Bo, Yang Ling-Ling, Wang Wen-Jing, Li Lin-Li, and Yang Sheng-Yong. Id-score: A new empirical scoring function based on a comprehensive set of descriptors related to protein–ligand interactions. Journal of Chemical Information and Modeling, 53(3):592–600, 2013. [DOI] [PubMed] [Google Scholar]
[46].Lipinski Christopher A., Lombardo Franco, Dominy Beryl W., and Feeney Paul J.. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews, 23(1):3–25, 1997. In Vitro Models for Selection of Development Candidates. [DOI] [PubMed] [Google Scholar]
[47].Liu Song, Zhu Lizhe, Sheong Fu Kit, Wang Wei, and Huang Xuhui. Adaptive partitioning by local density-peaks: An efficient density-based clustering algorithm for analyzing molecular dynamics trajectories. Journal of Computational Chemistry, 38(3):152–160, 2017. [DOI] [PubMed] [Google Scholar]
[48].Lovering Frank, Bikker Jack, and Humblet Christine. Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success. Journal of Medicinal Chemistry, 52(21):6752–6756, nov 2009. [DOI] [PubMed] [Google Scholar]
[49] *. Lyu Jiankun, Wang Sheng, Balius Trent E, Singh Isha, Levit Anat, Moroz Yurii S, Meara Matthew J O, Che Tao, O’Meara Matthew J, Che Tao, Algaa Enkhjargal, Tolmachova Kateryna, Tolmachev Andrey A, Shoichet Brian K, Roth Bryan L, and Irwin John J. Ultra-large library docking for discovering new chemotypes. Nature, 566(7743):224–229, feb 2019. The first ultra-large virtual screen published in a peer reviewed journal, including experimental validation.
[50].Mayr Andreas, Klambauer Günter, Unterthiner Thomas, and Hochreiter Sepp. Deeptox: Toxicity prediction using deep learning. Frontiers in Environmental Science, 3, 2016. [Google Scholar]
[51].Paiva Stacey-Lynn and Crews Craig M. Targeted protein degradation: elements of protac design. Current opinion in chemical biology, 50:111–119, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[52].Panteleev Jane, Gao Hua, and Jia Lei. Recent applications of machine learning in medicinal chemistry. Bioorganic and medicinal chemistry letters, 28(17):2807–2815, September 2018. [DOI] [PubMed] [Google Scholar]
[53].Papaleo Elena, Mereghetti Paolo, Fantucci Piercarlo, Grandori Rita, and Gioia Luca De. Free-energy landscape, principal component analysis, and structural clustering to identify representative conformations from molecular dynamics simulations: the myoglobin case. Journal of molecular graphics and modelling, 27(8):889–899, 2009. [DOI] [PubMed] [Google Scholar]
[54].Pason Lukas P. and Sotriffer Christoph A.. Empirical scoring functions for affinity prediction of protein-ligand complexes. Molecular Informatics, 35(11-12):541–548, 2016. [DOI] [PubMed] [Google Scholar]
[55].Platero-Rochart Daniel, González-Alemán Roy, Hernández-Rodríguez Erix W, Leclerc Fabrice, Caballero Julio, and Montero-Cabrera Luis. Rcdpeaks: memory-efficient density peaks clustering of long molecular dynamics. Bioinformatics, 2022. [DOI] [PubMed] [Google Scholar]
[56].Pozzan Alfonso. QM Calculations in ADMET Prediction, pages 285–305. Springer US, New York, NY, 2020. [DOI] [PubMed] [Google Scholar]
[57].Rácz Anita, Bajusz Dávid, Miranda-Quintana Ramón Alain, and Héberger Károly. Machine learning models for classification tasks related to drug safety. Molecular Diversity, 25(3):1409–1424, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[58].Raha Kaushik and Merz Kenneth M.. A quantum mechanics-based scoring function: Study of zinc ion-mediated ligand binding. Journal of the American Chemical Society, 126(4):1020–1021, 2004. [DOI] [PubMed] [Google Scholar]
[59].Rogers David and Hahn Mathew. Extended-connectivity fingerprints. Journal of Chemical Information and Modeling, 50(5):742–754, 2010. [DOI] [PubMed] [Google Scholar]
[60].Ruddigkeit Lars, Van Deursen Ruud, Blum Lorenz C., and Reymond Jean Louis. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. Journal of Chemical Information and Modeling, 52(11):2864–2875, 2012. [DOI] [PubMed] [Google Scholar]
[61].Decherchi S, Berteotti A, Bottegoni G, Rocchia W, and Cavalli A The ligand binding mechanism to purine nucleoside phosphorylase elucidated via molecular dynamics and machine learning. Nat Commun., 6:6155, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[62].Sadybekov Arman A, Sadybekov Anastasiia V, Liu Yongfeng, Iliopoulos-Tsoutsouvas Christos, Huang Xi-Ping, Pickett Julie, Houser Blake, Patel Nilkanth, Tran Ngan K, Tong Fei, et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature, 601(7893):452–459, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
[63] *. Santos-Martins Diogo, Solis-Vasquez Leonardo, Tillack Andreas F, Sanner Michel F, Koch Andreas, and Forli Stefano. Accelerating AutoDock 4 with GPUs and Gradient-Based Local Search. Journal of Chemical Theory and Computation, 17(2):1060–1073, feb 2021. GPU accelerated version of the well-known AutoDock 4 docking program.
[64].Shao Jianyin, Tanner Stephen W, Thompson Nephi, and Cheatham Thomas E. Clustering molecular dynamics trajectories: 1. characterizing the performance of different clustering algorithms. Journal of chemical theory and computation, 3(6):2312–2334, 2007. [DOI] [PubMed] [Google Scholar]
[65].Shaw David E., Adams Peter J., Azaria Asaph, Bank Joseph A., Batson Brannon, Bell Alistair, Bergdorf Michael, Bhatt Jhanvi, Butts J. Adam, Correia Timothy, Dirks Robert M., Dror Ron O., Eastwood Michael P., Edwards Bruce, Even Amos, Feldmann Peter, Fenn Michael, Fenton Christopher H., Forte Anthony, Gagliardo Joseph, Gill Gennette, Gorlatova Maria, Greskamp Brian, Grossman JP, Gullingsrud Justin, Harper Anissa, Hasenplaugh William, Heily Mark, Heshmat Benjamin Colin, Hunt Jeremy, Ierardi Douglas J., Iserovich Lev, Jackson Bryan L., Johnson Nick P., Kirk Mollie M., Klepeis John L., Kuskin Jeffrey S., Mackenzie Kenneth M., Mader Roy J., McGowen Richard, McLaughlin Adam, Moraes Mark A., Nasr Mohamed H., Nociolo Lawrence J., O’Donnell Lief, Parker Andrew, Peticolas Jon L., Pocina Goran, Predescu Cristian, Quan Terry, Salmon John K., Schwink Carl, Shim Keun Sup, Siddique Naseer, Spengler Jochen, Szalay Tamas, Tabladillo Raymond, Tartler Reinhard, Taube Andrew G., Theobald Michael, Towles Brian, Vick William, Wang Stanley C., Wazlowski Michael, Weingarten Madeleine J., Williams John M., and Yuh Kevin A.. Anton 3: Twenty microseconds of molecular dynamics simulation before lunch. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’21, New York, NY, USA, 2021. Association for Computing Machinery. [Google Scholar]
[66].Shi Mingsong, Xu Dingguo, and Zeng Jun. Gpu accelerated quantum virtual screening: Application for the natural inhibitors of new dehli metalloprotein (ndm-1). Frontiers in chemistry, 6:564, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[67] *. Stein Reed M, Kang Hye Jin, McCorvy John D, Glatfelter Grant C, Jones Anthony J, Che Tao, Slocum Samuel, Huang Xi-Ping, Savych Olena, Moroz Yurii S, Stauch Benjamin, Johansson Linda C, Cherezov Vadim, Kenakin Terry, Irwin John J, Shoichet Brian K, Roth Bryan L, and Dubocovich Margarita L. Virtual discovery of melatonin receptor ligands to modulate circadian rhythms. Nature, feb 2020. One of the early success stories of ultra-large virtual screens, including positive animal data.
[68] **.Sterling Teague and Irwin John J.. ZINC 15 – Ligand Discovery for Everyone. Journal of Chemical Information and Modeling, 55(11):2324–2337, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[69].Stojanović Luka, Popović Miloš, Tijanić Nebojša, Rakočević Goran, and Kalinić Marko. Improved scaffold hopping in ligand-based virtual screening using neural representation learning. Journal of Chemical Information and Modeling, 60(10):4629–4639, 2020. [DOI] [PubMed] [Google Scholar]
[70].Sun Chang and Gennis Robert B.. Single-particle cryo-em studies of transmembrane proteins in sma copolymer nanodiscs. Chemistry and Physics of Lipids, 221:114–119, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[71].Tao L, Zhang P, Qin C, Chen SY, Zhang C, Chen Z, Zhu F, Yang SY, Wei YQ, and Chen YZ. Recent progresses in the exploration of machine learning methods as in-silico adme prediction tools. Advanced Drug Delivery Reviews, 86:83–100, 2015. In silico ADMET predictions in pharmaceutical research. [DOI] [PubMed] [Google Scholar]
[72].Tomberg Anna and Boström Jonas. Can ‘easy’chemistry produce complex, diverse, and novel molecules? Drug Discovery Today, 2020. [DOI] [PubMed] [Google Scholar]
[73].Trott Oleg and Olson Arthur J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2):455–61, jan 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[74] *. Tunyasuvunakool Kathryn, Adler Jonas, Wu Zachary, Green Tim, Zielinski Michal, Žídek Augustin, Bridgland Alex, Cowie Andrew, Meyer Clemens, Laydon Agata, Velankar Sameer, Gerard J Kleywegt Alex Bateman, Evans Richard, Pritzel Alexander, Figurnov Michael, Ronneberger Olaf, Bates Russ, Kohl Simon A A, Potapenko Anna, Ballard Andrew J, Romera-Paredes Bernardino, Nikolov Stanislav, Jain Rishub, Clancy Ellen, Reiman David, Petersen Stig, Senior Andrew W., Kavukcuoglu Koray, Birney Ewan, Kohli Pushmeet, Jumper John, and Hassabis Demis. Highly accurate protein structure prediction for the human proteome. Nature, jul 2021. Accurate protein structures for the human proteome. These structures can be used in structure-based virtual screens.
[75].Velec Hans, Gohlke Holger, and Klebe Gerhard. Drugscore csd knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. Journal of medicinal chemistry, 48:6296–303, 11 2005. [DOI] [PubMed] [Google Scholar]
[76].Verdonk Marcel L., Cole Jason C., Hartshorn Michael J., Murray Christopher W., and Taylor Richard D.. Improved protein–ligand docking using gold. Proteins: Structure, Function, and Bioinformatics, 52(4):609–623, 2003. [DOI] [PubMed] [Google Scholar]
[77].Vo Andy H, Van Vleet Terry R, Gupta Rishi R, Liguori Michael J, and Rao Mohan S. An overview of machine learning and big data for drug toxicity evaluation. Chemical research in toxicology, 33(1):20–37, 2019. [DOI] [PubMed] [Google Scholar]
[78].Waldron Kevin J and Robinson Nigel J. How do bacterial cells ensure that metalloproteins get the correct metal? Nature Reviews Microbiology, 7(1):25–35, 2009. [DOI] [PubMed] [Google Scholar]
[79].Wolf Antje and Kirschner Karl N. Principal component and clustering analysis on molecular dynamics data of the ribosomal l11· 23s subdomain. Journal of molecular modeling, 19(2):539–549, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[80].Wolf Antje and Kirschner Karl Nicholas. Principal component and clustering analysis on molecular dynamics data of the ribosomal l11·23s subdomain. Journal of Molecular Modeling, 19:539 – 549, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[81].Wong Kin Meng, Tai Hio Kuan, and Siu Shirley W. I.. GWOVina: A grey wolf optimization approach to rigid and flexible receptor docking. Chemical Biology & Drug Design, page cbdd.13764, aug 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[82].Wu Yunyi and Wang Guanyu. Machine learning based toxicity prediction: from chemical structural description to transcriptome analysis. International journal of molecular sciences, 19(8):2358, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[83].Xiong Guoli, Wu Zhenxing, Yi Jiacai, Fu Li, Yang Zhijiang, Hsieh Changyu, Yin Mingzhu, Zeng Xiangxiang, Wu Chengkun, Lu Aiping, Chen Xiang, Hou Tingjun, and Cao Dongsheng. ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Research, 49(W1):W5–W14, 04 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
[84].Yang Z, Liu Y, and et al Z. Chen. A quantum mechanics-based halogen bonding scoring function for protein-ligand interactions. J. Mol Model, 21:138, 2015. [DOI] [PubMed] [Google Scholar]
[85].Zinc15 library, snapshot 08/2018, virtualflow version. https://virtual-flow.org/virtualflow-version-zinc15-library, 2022. Accessed: 2022-02-11. [Google Scholar]
[86].Zoete Vincent, Daina Antoine, Bovigny Christophe, and Michielin Olivier. Swisssimilarity: A web tool for low to ultra high throughput ligand-based virtual screening. Journal of Chemical Information and Modeling, 56(8):1399–1404, 2016. [DOI] [PubMed] [Google Scholar]

[R1] [1].Acharya A, Agarwal R, Baker MB, Baudry J, Bhowmik D, Boehm S, Byler KG, Chen SY, Coates L, Cooper CJ, Demerdash O, Daidone I, Eblen JD, Ellingson S, Forli S, Glaser J, Gumbart JC, Gunnels J, Hernandez O, Irle S, Kneller DW, Kovalevsky A, Larkin J, Lawrence TJ, LeGrand S, Liu S-H, Mitchell JC, Park G, Parks JM, Pavlova A, Petridis L, Poole D, Pouchard L, Ramanathan A, Rogers DM, Santos-Martins D, Scheinberg A, Sedova A, Shen Y, Smith JC, Smith MD, Soto C, Tsaris A, Thavappiragasam M, Tillack AF, Vermaas JV, Vuong VQ, Yin J, Yoo S, Zahran M, and Zanetti-Polzi L. Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19. Journal of Chemical Information and Modeling, 60(12):5832–5852, dec 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] [2].Adeshina Yusuf O, Deeds Eric J, and Karanicolas John. Machine learning classification can reduce false positives in structure-based virtual screening. Proceedings of the National Academy of Sciences, 117(31):18477–18488, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Amendola Giorgio and Cosconati Sandro. Pyrmd: A new fully automated ai-powered ligand-based virtual screening tool. Journal of Chemical Information and Modeling, 61(8):3835–3845, 2021. [DOI] [PubMed] [Google Scholar]

[R4] [4].Baek Minkyung, DiMaio Frank, Anishchenko Ivan, Dauparas Justas, Ovchinnikov Sergey, Lee Gyu Rie, Wang Jue, Cong Qian, Kinch Lisa N, Schaeffer R Dustin, Millán Claudia, Park Hahnbeom, Adams Carson, Glassman Caleb R, DeGiovanni Andy, Pereira Jose H, Rodrigues Andria V, van Dijk Alberdina A., Ebrecht Ana C, Opperman Diederik J, Sagmeister Theo, Buhlheller Christoph, Pavkov-Keller Tea, Rathinaswamy Manoj K, Dalwadi Udit, Yip Calvin K, Burke John E, Garcia K Christopher, Grishin Nick V, Adams Paul D, Read Randy J, and Baker David. Accurate prediction of protein structures and interactions using a three-track neural network. Science, 8754(July):eabj8754, jul 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5] **. Bender Brian J., Gahbauer Stefan, Luttens Andreas, Lyu Jiankun, Webb Chase M., Stein Reed M., Fink Elissa A., Balius Trent E., Carlsson Jens, Irwin John J., and Shoichet Brian K.. A practical guide to large-scale docking. Nature Protocols, 2021. Elaborate protocol on how to carry out ultra-large virtual screenings using DOCK3.7.

[R6] [6].Blum Lorenz C and Reymond Jean-Louis. 970 million druglike small molecules for virtual screening in the chemical universe database gdb-13. Journal of the American Chemical Society, 131(25):8732–8733, 2009. [DOI] [PubMed] [Google Scholar]

[R7] [7].Bohacek Regine S., McMartin Colin, and Guida Wayne C.. The art and practice of structure-based drug design: A molecular modeling perspective. Medicinal Research Reviews, 16(1):3–50, 1996. [DOI] [PubMed] [Google Scholar]

[R8] [8].Brooijmans N and Kuntz ID. Molecular recognition and docking algorithms. Annu Rev Biophys Biomol Struct., 32:335–373, 2003. [DOI] [PubMed] [Google Scholar]

[R9] [9].Bühlmann Sven and Reymond Jean-Louis. Chembl-likeness score and database gdbchembl. Frontiers in Chemistry, page 46, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Böhm HJ. The computer program ludi: a new method for the de novo design of enzyme inhibitors. J Comput Aided Mol Des., 6(1):61–78, 1992. [DOI] [PubMed] [Google Scholar]

[R11] [11].Carhart Raymond E., Smith Dennis H., and Venkataraghavan R. Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Computer Sciences, 25(2):64–73, 1985. [Google Scholar]

[R12] [12].Cavasotto Claudio N. and Aucar M. Gabriela. High-throughput docking using quantum mechanical scoring. Frontiers in Chemistry, 8:246, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Clark Matthew and Steger-Hartmann Thomas. A big data approach to the concordance of the toxicity of pharmaceuticals in animals and humans. Regulatory Toxicology and Pharmacology, 96:94–105, 2018. [DOI] [PubMed] [Google Scholar]

[R14] [14].Darme Pierre, Dauchez Manuel, Renard Arnaud, Voutquenne-Nazabadioko Laurence, Aubert Dominique, Escotte-Binet Sandie, Renault Jean-Hugues, Villena Isabelle, Steffenel Luiz-Angelo, and Baud Stephanie. Amide v2: High-throughput screening based on autodock-gpu and improved workflow leading to better performance and reliability. International journal of molecular sciences, 22(14):7489, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].De Vivo Marco, Masetti Matteo, Bottegoni Giovanni, and Cavalli Andrea. Role of molecular dynamics and related methods in drug discovery. Journal of Medicinal Chemistry, 59(9):4035–4061, 2016. [DOI] [PubMed] [Google Scholar]

[R16] [16].Falcon Wilfredo Evangelista, Ellingson Sally R., Smith Jeremy C., and Baudry Jerome. Ensemble docking in drug discovery: How many protein configurations from molecular dynamics simulations are needed to reproduce known ligand binding? The Journal of Physical Chemistry B, 123(25):5189–5195, 2019. [DOI] [PubMed] [Google Scholar]

[R17] [17].Ferreira Leonardo L G and Andricopulo Adriano D. Admet modeling approaches in drug discovery. Drug discovery today, 24(5):1157–1165, May 2019. [DOI] [PubMed] [Google Scholar]

[R18] [18].Gentile F, Yaacoub JJC, Gleave, and et al. Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking. Nat Protoc, pages 1–26, 2022. [DOI] [PubMed] [Google Scholar]

[R19] [19].Gentile Francesco, Agrawal Vibudh, Hsing Michael, Ton Anh-Tien, Ban Fuqiang, Norinder Ulf, Gleave Martin E., and Cherkasov Artem. Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery. ACS Central Science, 6(6):939–949, jun 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] [20].Gilmer Justin, Schoenholz Samuel S., Riley Patrick F., Vinyals Oriol, and Dahl George E.. Neural message passing for quantum chemistry. CoRR, abs/1704.01212, 2017. [Google Scholar]

[R21] [21] **. Gorgulla Christoph, Boeszoermenyi Andras, Wang Zi-fu, Fischer Patrick D, Coote Paul W., Das Krishna M. Padmanabha, Malets Yehor S, Radchenko Dmytro S, Moroz Yurii S, Scott David A, Fackeldey Konstantin, Hoffmann Moritz, Iavniuk Iryna, Wagner Gerhard, and Arthanari Haribabu. An open-source drug discovery platform enables ultra-large virtual screens. Nature, 580(7805):663–668, apr 2020. The first virtual screening platform (VirtualFlow) able to routinely carry out ultra-large virtual screenings of billions of compounds.

[R22] [22].Gorgulla Christoph, Çınaroğlu Süleyman Selim, Fischer Patrick D, Fackeldey Konstantin, Wagner Gerhard, and Arthanari Haribabu. VirtualFlow Ants—Ultra-Large Virtual Screenings with Artificial Intelligence Driven Docking Algorithm Based on Ant Colony Optimization. International Journal of Molecular Sciences, 22(11):5807, may 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] [23].Gorgulla Christoph, Fackeldey Konstantin, Wagner Gerhard, and Arthanari Haribabu. Accounting of Receptor Flexibility in Ultra-Large Virtual Screens with VirtualFlow Using a Grey Wolf Optimization Method. Supercomputing Frontiers and Innovations, 7(3):4–12, sep 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] [24].Gorgulla Christoph, Das Krishna M. Padmanabha, Leigh Kendra E., Cespugli Marco, Fischer Patrick D., Wang Zi Fu, Tesseyre Guilhem, Pandita Shreya, Shnapir Alec, Calderaio Anthony, Gechev Minko, Rose Alexander, Lewis Noam, Hutcheson Colin, Yaffe Erez, Luxenburg Roni, Herce Henry D., Durmaz Vedat, Halazonetis Thanos D., Fackeldey Konstantin, Patten JJ, Chuprina Alexander, Dziuba Igor, Plekhova Alla, Moroz Yurii, Radchenko Dmytro, Tarkhanova Olga, Yavnyuk Irina, Gruber Christian, Yust Ryan, Payne Dave, Näär Anders M., Namchuk Mark N., Davey Robert A., Wagner Gerhard, Kinney Jamie, and Arthanari Haribabu. A multi-pronged approach targeting SARS-CoV-2 proteins using ultra-large virtual screening. iScience, 24(2):102021, feb 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] [25].Grant JA, Gallardo MA, and Pickup BT. A fast method of molecular shape comparison: A simple application of a gaussian description of molecular shape. Journal of Computational Chemistry, 17(14):1653–1666, 1996. [Google Scholar]

[R26] [26].Grebner Christoph, Malmerberg Erik, Shewmaker Andrew, Batista Jose, Nicholls Anthony, and Sadowski Jens. Virtual Screening in the Cloud: How Big Is Big Enough? Journal of Chemical Information and Modeling, 2019. [DOI] [PubMed] [Google Scholar]

[R27] [27].Grimm Maximilian, Liu Yang, Yang Xiaocong, Bu Chunya, Xiao Zhixiong, and Cao Yang. Ligmate: A multifeature integration algorithm for ligand-similarity-based virtual screening. Journal of Chemical Information and Modeling, 60(12):6044–6053, 2020. [DOI] [PubMed] [Google Scholar]

[R28] [28].Grygorenko Oleksandr O., Radchenko Dmytro S., Dziuba Igor, Chuprina Alexander, Gubina Kateryna E., and Moroz Yurii S.. Generating Multibillion Chemical Space of Readily Accessible Screening Compounds. iScience, 23(11):101681, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] [29].Gupta Aayush and Zhou Huan-Xiang. Machine learning-enabled pipeline for large-scale virtual drug screening. Journal of Chemical Information and Modeling, 61(9):4236–4244, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Göller Andreas H., Kuhnke Lara, Montanari Floriane, Bonin Anne, Schneckener Sebastian, Laak Antonius ter, Wichard Jörg, Lobell Mario, and Hillisch Alexander. Bayer’s in silico admet platform: a journey of machine learning over the past two decades. Drug Discovery Today, 25(9):1702–1709, 2020. [DOI] [PubMed] [Google Scholar]

[R31] [31].Hamza Adel, Wei Ning-Ning, and Zhan Chang-Guo. Ligand-based virtual screening approach using a new scoring function. Journal of Chemical Information and Modeling, 52(4):963–974, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32] **. Heifetz Alexander. Quantum mechanics in drug discovery. Springer, 2020. Comprehensive book on the current status of quantum mechanical methods in drug discovery.

[R33] [33] **. Irwin John J, Tang Khanh G, Young Jennifer, Dandarchuluun Chinzorig, Wong Benjamin R, Khurelbaatar Munkhzul, Moroz Yurii S, Mayfield John, and Sayle Roger A. ZINC20—A Free Ultralarge-Scale Chemical Database for Ligand Discovery. Journal of Chemical Information and Modeling, 60(12):6065–6073, dec 2020. ZINC20 is the successor of ZINC15, and is most widely used compound library for virtual screening. The ZINC library was the first available ultra-large virtual screening library in ready-to-dock format.

[R34] [34].Jones Gareth, Willett Peter, and Glen Robert C.. Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. Journal of Molecular Biology, 245(1):43–53, 1995. [DOI] [PubMed] [Google Scholar]

[R35] [35] *. Jumper John, Evans Richard, Pritzel Alexander, Green Tim, Figurnov Michael, Ronneberger Olaf, Tunyasuvunakool Kathryn, Bates Russ, Žídek Augustin, Potapenko Anna, Bridgland Alex, Meyer Clemens, Kohl Simon A A, Ballard Andrew J, Cowie Andrew, Romera-Paredes Bernardino, Nikolov Stanislav, Jain Rishub, Adler Jonas, Back Trevor, Petersen Stig, Reiman David, Clancy Ellen, Zielinski Michal, Steinegger Martin, Pacholska Michalina, Berghammer Tamas, Bodenstein Sebastian, Silver David, Vinyals Oriol, Senior Andrew W., Kavukcuoglu Koray, Kohli Pushmeet, and Hassabis Demis. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589, aug 2021. AlphaFold is the first de-novo protein structure prediction method reaching experimental reliability and accuracy.

[R36] [36].Kim Sunghwan, Thiessen Paul A, Bolton Evan E, Chen Jie, Fu Gang, Gindulyte Asta, Han Lianyi, He Jane, He Siqian, Shoe-maker Benjamin A, et al. Pubchem substance and compound databases. Nucleic acids research, 44(D1):D1202–D1213, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] [37].Kirchmair J, Göller AH, Lang D, Kunze J, Testa B, Wilson ID, Glen R, and et al. Predicting drug metabolism: experiment and/or computation? Nat Rev Drug Discov, 14:387–404, 2015. [DOI] [PubMed] [Google Scholar]

[R38] [38].Koes David Ryan, Baumgartner Matthew P., and Camacho Carlos J.. Lessons Learned in Empirical Scoring with smina from the CSAR 2011 Benchmarking Exercise. Journal of Chemical Information and Modeling, 53(8):1893–1904, aug 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] [39].Koes David Ryan and Camacho Carlos J.. ZINCPharmer: pharmacophore search of the ZINC database. Nucleic Acids Research, 40(W1):W409–W414, 05 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] [40].Korb Oliver, Stützle Thomas, and Exner Thomas E.. PLANTS: Application of ant colony optimization to structure-based drug design. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4150 LNCS:247–258, 2006. [Google Scholar]

[R41] [41].Kühlbrandt Werner. The resolution revolution. Science, 343(6178):1443–1444, 2014. [DOI] [PubMed] [Google Scholar]

[R42] [42].Lagorce David, Bouslama Lina, Becot Jerome, Miteva Maria A, and Villoutreix Bruno O. FAF-Drugs4: free ADME-tox filtering computations for chemical biology and early stages drug discovery. Bioinformatics, 33(22):3658–3660, 07 2017. [DOI] [PubMed] [Google Scholar]

[R43] [43].LeGrand Scott, Scheinberg Aaron, Tillack Andreas F., Thavappiragasam Mathialakan, Vermaas Josh V., Agarwal Rupesh, Larkin Jeff, Poole Duncan, Santos-Martins Diogo, Solis-Vasquez Leonardo, Koch Andreas, Forli Stefano, Hernandez Oscar, Smith Jeremy C., and Sedova Ada. GPU-Accelerated Drug Discovery with Docking on the Summit Supercomputer. In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pages 1–10, New York, NY, USA, sep 2020. ACM. [Google Scholar]

[R44] [44].Lešnik Samo, Štular Tanja, Brus Boris, Knez Damijan, Gobec Stanislav, Janežič Dušanka, and Konc Janez. Lisica: A software for ligand-based virtual screening and its application for the discovery of butyrylcholinesterase inhibitors. Journal of Chemical Information and Modeling, 55(8):1521–1528, 2015. [DOI] [PubMed] [Google Scholar]

[R45] [45].Li Guo-Bo, Yang Ling-Ling, Wang Wen-Jing, Li Lin-Li, and Yang Sheng-Yong. Id-score: A new empirical scoring function based on a comprehensive set of descriptors related to protein–ligand interactions. Journal of Chemical Information and Modeling, 53(3):592–600, 2013. [DOI] [PubMed] [Google Scholar]

[R46] [46].Lipinski Christopher A., Lombardo Franco, Dominy Beryl W., and Feeney Paul J.. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews, 23(1):3–25, 1997. In Vitro Models for Selection of Development Candidates. [DOI] [PubMed] [Google Scholar]

[R47] [47].Liu Song, Zhu Lizhe, Sheong Fu Kit, Wang Wei, and Huang Xuhui. Adaptive partitioning by local density-peaks: An efficient density-based clustering algorithm for analyzing molecular dynamics trajectories. Journal of Computational Chemistry, 38(3):152–160, 2017. [DOI] [PubMed] [Google Scholar]

[R48] [48].Lovering Frank, Bikker Jack, and Humblet Christine. Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success. Journal of Medicinal Chemistry, 52(21):6752–6756, nov 2009. [DOI] [PubMed] [Google Scholar]

[R49] [49] *. Lyu Jiankun, Wang Sheng, Balius Trent E, Singh Isha, Levit Anat, Moroz Yurii S, Meara Matthew J O, Che Tao, O’Meara Matthew J, Che Tao, Algaa Enkhjargal, Tolmachova Kateryna, Tolmachev Andrey A, Shoichet Brian K, Roth Bryan L, and Irwin John J. Ultra-large library docking for discovering new chemotypes. Nature, 566(7743):224–229, feb 2019. The first ultra-large virtual screen published in a peer reviewed journal, including experimental validation.

[R50] [50].Mayr Andreas, Klambauer Günter, Unterthiner Thomas, and Hochreiter Sepp. Deeptox: Toxicity prediction using deep learning. Frontiers in Environmental Science, 3, 2016. [Google Scholar]

[R51] [51].Paiva Stacey-Lynn and Crews Craig M. Targeted protein degradation: elements of protac design. Current opinion in chemical biology, 50:111–119, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] [52].Panteleev Jane, Gao Hua, and Jia Lei. Recent applications of machine learning in medicinal chemistry. Bioorganic and medicinal chemistry letters, 28(17):2807–2815, September 2018. [DOI] [PubMed] [Google Scholar]

[R53] [53].Papaleo Elena, Mereghetti Paolo, Fantucci Piercarlo, Grandori Rita, and Gioia Luca De. Free-energy landscape, principal component analysis, and structural clustering to identify representative conformations from molecular dynamics simulations: the myoglobin case. Journal of molecular graphics and modelling, 27(8):889–899, 2009. [DOI] [PubMed] [Google Scholar]

[R54] [54].Pason Lukas P. and Sotriffer Christoph A.. Empirical scoring functions for affinity prediction of protein-ligand complexes. Molecular Informatics, 35(11-12):541–548, 2016. [DOI] [PubMed] [Google Scholar]

[R55] [55].Platero-Rochart Daniel, González-Alemán Roy, Hernández-Rodríguez Erix W, Leclerc Fabrice, Caballero Julio, and Montero-Cabrera Luis. Rcdpeaks: memory-efficient density peaks clustering of long molecular dynamics. Bioinformatics, 2022. [DOI] [PubMed] [Google Scholar]

[R56] [56].Pozzan Alfonso. QM Calculations in ADMET Prediction, pages 285–305. Springer US, New York, NY, 2020. [DOI] [PubMed] [Google Scholar]

[R57] [57].Rácz Anita, Bajusz Dávid, Miranda-Quintana Ramón Alain, and Héberger Károly. Machine learning models for classification tasks related to drug safety. Molecular Diversity, 25(3):1409–1424, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R58] [58].Raha Kaushik and Merz Kenneth M.. A quantum mechanics-based scoring function: Study of zinc ion-mediated ligand binding. Journal of the American Chemical Society, 126(4):1020–1021, 2004. [DOI] [PubMed] [Google Scholar]

[R59] [59].Rogers David and Hahn Mathew. Extended-connectivity fingerprints. Journal of Chemical Information and Modeling, 50(5):742–754, 2010. [DOI] [PubMed] [Google Scholar]

[R60] [60].Ruddigkeit Lars, Van Deursen Ruud, Blum Lorenz C., and Reymond Jean Louis. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. Journal of Chemical Information and Modeling, 52(11):2864–2875, 2012. [DOI] [PubMed] [Google Scholar]

[R61] [61].Decherchi S, Berteotti A, Bottegoni G, Rocchia W, and Cavalli A The ligand binding mechanism to purine nucleoside phosphorylase elucidated via molecular dynamics and machine learning. Nat Commun., 6:6155, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R62] [62].Sadybekov Arman A, Sadybekov Anastasiia V, Liu Yongfeng, Iliopoulos-Tsoutsouvas Christos, Huang Xi-Ping, Pickett Julie, Houser Blake, Patel Nilkanth, Tran Ngan K, Tong Fei, et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature, 601(7893):452–459, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R63] [63] *. Santos-Martins Diogo, Solis-Vasquez Leonardo, Tillack Andreas F, Sanner Michel F, Koch Andreas, and Forli Stefano. Accelerating AutoDock 4 with GPUs and Gradient-Based Local Search. Journal of Chemical Theory and Computation, 17(2):1060–1073, feb 2021. GPU accelerated version of the well-known AutoDock 4 docking program.

[R64] [64].Shao Jianyin, Tanner Stephen W, Thompson Nephi, and Cheatham Thomas E. Clustering molecular dynamics trajectories: 1. characterizing the performance of different clustering algorithms. Journal of chemical theory and computation, 3(6):2312–2334, 2007. [DOI] [PubMed] [Google Scholar]

[R65] [65].Shaw David E., Adams Peter J., Azaria Asaph, Bank Joseph A., Batson Brannon, Bell Alistair, Bergdorf Michael, Bhatt Jhanvi, Butts J. Adam, Correia Timothy, Dirks Robert M., Dror Ron O., Eastwood Michael P., Edwards Bruce, Even Amos, Feldmann Peter, Fenn Michael, Fenton Christopher H., Forte Anthony, Gagliardo Joseph, Gill Gennette, Gorlatova Maria, Greskamp Brian, Grossman JP, Gullingsrud Justin, Harper Anissa, Hasenplaugh William, Heily Mark, Heshmat Benjamin Colin, Hunt Jeremy, Ierardi Douglas J., Iserovich Lev, Jackson Bryan L., Johnson Nick P., Kirk Mollie M., Klepeis John L., Kuskin Jeffrey S., Mackenzie Kenneth M., Mader Roy J., McGowen Richard, McLaughlin Adam, Moraes Mark A., Nasr Mohamed H., Nociolo Lawrence J., O’Donnell Lief, Parker Andrew, Peticolas Jon L., Pocina Goran, Predescu Cristian, Quan Terry, Salmon John K., Schwink Carl, Shim Keun Sup, Siddique Naseer, Spengler Jochen, Szalay Tamas, Tabladillo Raymond, Tartler Reinhard, Taube Andrew G., Theobald Michael, Towles Brian, Vick William, Wang Stanley C., Wazlowski Michael, Weingarten Madeleine J., Williams John M., and Yuh Kevin A.. Anton 3: Twenty microseconds of molecular dynamics simulation before lunch. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’21, New York, NY, USA, 2021. Association for Computing Machinery. [Google Scholar]

[R66] [66].Shi Mingsong, Xu Dingguo, and Zeng Jun. Gpu accelerated quantum virtual screening: Application for the natural inhibitors of new dehli metalloprotein (ndm-1). Frontiers in chemistry, 6:564, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R67] [67] *. Stein Reed M, Kang Hye Jin, McCorvy John D, Glatfelter Grant C, Jones Anthony J, Che Tao, Slocum Samuel, Huang Xi-Ping, Savych Olena, Moroz Yurii S, Stauch Benjamin, Johansson Linda C, Cherezov Vadim, Kenakin Terry, Irwin John J, Shoichet Brian K, Roth Bryan L, and Dubocovich Margarita L. Virtual discovery of melatonin receptor ligands to modulate circadian rhythms. Nature, feb 2020. One of the early success stories of ultra-large virtual screens, including positive animal data.

[R68] [68] **.Sterling Teague and Irwin John J.. ZINC 15 – Ligand Discovery for Everyone. Journal of Chemical Information and Modeling, 55(11):2324–2337, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R69] [69].Stojanović Luka, Popović Miloš, Tijanić Nebojša, Rakočević Goran, and Kalinić Marko. Improved scaffold hopping in ligand-based virtual screening using neural representation learning. Journal of Chemical Information and Modeling, 60(10):4629–4639, 2020. [DOI] [PubMed] [Google Scholar]

[R70] [70].Sun Chang and Gennis Robert B.. Single-particle cryo-em studies of transmembrane proteins in sma copolymer nanodiscs. Chemistry and Physics of Lipids, 221:114–119, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R71] [71].Tao L, Zhang P, Qin C, Chen SY, Zhang C, Chen Z, Zhu F, Yang SY, Wei YQ, and Chen YZ. Recent progresses in the exploration of machine learning methods as in-silico adme prediction tools. Advanced Drug Delivery Reviews, 86:83–100, 2015. In silico ADMET predictions in pharmaceutical research. [DOI] [PubMed] [Google Scholar]

[R72] [72].Tomberg Anna and Boström Jonas. Can ‘easy’chemistry produce complex, diverse, and novel molecules? Drug Discovery Today, 2020. [DOI] [PubMed] [Google Scholar]

[R73] [73].Trott Oleg and Olson Arthur J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of computational chemistry, 31(2):455–61, jan 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R74] [74] *. Tunyasuvunakool Kathryn, Adler Jonas, Wu Zachary, Green Tim, Zielinski Michal, Žídek Augustin, Bridgland Alex, Cowie Andrew, Meyer Clemens, Laydon Agata, Velankar Sameer, Gerard J Kleywegt Alex Bateman, Evans Richard, Pritzel Alexander, Figurnov Michael, Ronneberger Olaf, Bates Russ, Kohl Simon A A, Potapenko Anna, Ballard Andrew J, Romera-Paredes Bernardino, Nikolov Stanislav, Jain Rishub, Clancy Ellen, Reiman David, Petersen Stig, Senior Andrew W., Kavukcuoglu Koray, Birney Ewan, Kohli Pushmeet, Jumper John, and Hassabis Demis. Highly accurate protein structure prediction for the human proteome. Nature, jul 2021. Accurate protein structures for the human proteome. These structures can be used in structure-based virtual screens.

[R75] [75].Velec Hans, Gohlke Holger, and Klebe Gerhard. Drugscore csd knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. Journal of medicinal chemistry, 48:6296–303, 11 2005. [DOI] [PubMed] [Google Scholar]

[R76] [76].Verdonk Marcel L., Cole Jason C., Hartshorn Michael J., Murray Christopher W., and Taylor Richard D.. Improved protein–ligand docking using gold. Proteins: Structure, Function, and Bioinformatics, 52(4):609–623, 2003. [DOI] [PubMed] [Google Scholar]

[R77] [77].Vo Andy H, Van Vleet Terry R, Gupta Rishi R, Liguori Michael J, and Rao Mohan S. An overview of machine learning and big data for drug toxicity evaluation. Chemical research in toxicology, 33(1):20–37, 2019. [DOI] [PubMed] [Google Scholar]

[R78] [78].Waldron Kevin J and Robinson Nigel J. How do bacterial cells ensure that metalloproteins get the correct metal? Nature Reviews Microbiology, 7(1):25–35, 2009. [DOI] [PubMed] [Google Scholar]

[R79] [79].Wolf Antje and Kirschner Karl N. Principal component and clustering analysis on molecular dynamics data of the ribosomal l11· 23s subdomain. Journal of molecular modeling, 19(2):539–549, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R80] [80].Wolf Antje and Kirschner Karl Nicholas. Principal component and clustering analysis on molecular dynamics data of the ribosomal l11·23s subdomain. Journal of Molecular Modeling, 19:539 – 549, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R81] [81].Wong Kin Meng, Tai Hio Kuan, and Siu Shirley W. I.. GWOVina: A grey wolf optimization approach to rigid and flexible receptor docking. Chemical Biology & Drug Design, page cbdd.13764, aug 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R82] [82].Wu Yunyi and Wang Guanyu. Machine learning based toxicity prediction: from chemical structural description to transcriptome analysis. International journal of molecular sciences, 19(8):2358, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R83] [83].Xiong Guoli, Wu Zhenxing, Yi Jiacai, Fu Li, Yang Zhijiang, Hsieh Changyu, Yin Mingzhu, Zeng Xiangxiang, Wu Chengkun, Lu Aiping, Chen Xiang, Hou Tingjun, and Cao Dongsheng. ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Research, 49(W1):W5–W14, 04 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R84] [84].Yang Z, Liu Y, and et al Z. Chen. A quantum mechanics-based halogen bonding scoring function for protein-ligand interactions. J. Mol Model, 21:138, 2015. [DOI] [PubMed] [Google Scholar]

[R85] [85].Zinc15 library, snapshot 08/2018, virtualflow version. https://virtual-flow.org/virtualflow-version-zinc15-library, 2022. Accessed: 2022-02-11. [Google Scholar]

[R86] [86].Zoete Vincent, Daina Antoine, Bovigny Christophe, and Michielin Olivier. Swisssimilarity: A web tool for low to ultra high throughput ligand-based virtual screening. Journal of Chemical Information and Modeling, 56(8):1399–1404, 2016. [DOI] [PubMed] [Google Scholar]

PERMALINK

Emerging Frontiers in Virtual Drug Discovery: From Quantum Mechanical Methods to Deep Learning Approaches

Christoph Gorgulla

Abhilash Jayaraj

Konstantin Fackeldey

Haribabu Arthanari

Abstract

1. Introduction

2. Ultra-large physics-based virtual screening

Figure 1: Visualization of the chemical space in a virtual screen.

2.1. Ligand libraries

2.2. High-performance computing infrastructure

2.3. Protein structures

2.4. Early days in ultra-large virtual screens

2.5. Software for ultra-large virtual screens

2.6. Challenging target classes

3. Ligand-based virtual screening

4. Machine learning (ML) approaches

4.1. Virtual screenings

4.2. ADMET predictions

5. Quantum chemistry approaches

5.1. Quantum mechanics(QM)-based docking and virtual screening

5.2. QM methods for ADMET predictions

6. Conclusions and outlook

Table 1: Ligand libraries for ultra-large virtual screens.

Acknowledgements

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Emerging Frontiers in Virtual Drug Discovery: From Quantum Mechanical Methods to Deep Learning Approaches

Christoph Gorgulla

Abhilash Jayaraj

Konstantin Fackeldey

Haribabu Arthanari

Abstract

1. Introduction

2. Ultra-large physics-based virtual screening

Figure 1: Visualization of the chemical space in a virtual screen.

2.1. Ligand libraries

2.2. High-performance computing infrastructure

2.3. Protein structures

2.4. Early days in ultra-large virtual screens

2.5. Software for ultra-large virtual screens

2.6. Challenging target classes

3. Ligand-based virtual screening

4. Machine learning (ML) approaches

4.1. Virtual screenings

4.2. ADMET predictions

5. Quantum chemistry approaches

5.1. Quantum mechanics(QM)-based docking and virtual screening

5.2. QM methods for ADMET predictions

6. Conclusions and outlook

Table 1: Ligand libraries for ultra-large virtual screens.

Acknowledgements

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases