Skip to main content
Journal of Pharmaceutical Analysis logoLink to Journal of Pharmaceutical Analysis
. 2025 Apr 9;15(6):101298. doi: 10.1016/j.jpha.2025.101298

druglikeFilter 1.0: An AI powered filter for collectively measuring the drug-likeness of compounds

Minjie Mou a,1, Yintao Zhang a,1, Yuntao Qian a,1, Zhimeng Zhou a,1, Yang Liao a, Tianle Niu b, Wei Hu c, Yuanhao Chen a, Ruoyu Jiang a, Hongping Zhao d, Haibin Dai c,⁎⁎⁎, Yang Zhang b,⁎⁎, Tingting Fu a,
PMCID: PMC12268052  PMID: 40678482

Abstract

Advancements in artificial intelligence (AI) and emerging technologies are rapidly expanding the exploration of chemical space, facilitating innovative drug discovery. However, the transformation of novel compounds into safe and effective drugs remains a lengthy, high-risk, and costly process. Comprehensive early-stage evaluation is essential for reducing costs and improving the success rate of drug development. Despite this need, no comprehensive tool currently supports systematic evaluation and efficient screening. Here, we present druglikeFilter, a deep learning-based framework designed to assess drug-likeness across four critical dimensions: 1) physicochemical rule evaluated by systematic determination, 2) toxicity alert investigated from multiple perspectives, 3) binding affinity measured by dual-path analysis, and 4) compound synthesizability assessed by retro-route prediction. By enabling automated, multidimensional filtering of compound libraries, druglikeFilter not only streamlines the drug development process but also plays a crucial role in advancing research efforts towards viable drug candidates, which can be freely accessed at https://idrblab.org/drugfilter/.

Keywords: Drug-likeness, Virtual screening, Deep learning, Drug discovery

Graphical abstract

Image 1

Highlights

  • Developed druglikeFilter a deep learning-based tool to accelerate drug discovery.

  • Enabled the collective evaluation of drug-likeness across four critical dimensions.

  • Automated the filtering of compound libraries to identify drug-like molecules.

1. Introduction

Artificial intelligence (AI) has become an integral tool in pharmaceutical research, with generative AI (GAI) exponentially expanding the chemical universe by assisting in the design of novel compounds [1]. It is known that transforming these compounds into safe and effective drugs remains a lengthy, high-risk, and costly process, posing numerous challenges and requiring considerable effort [2]. To reduce costs and improve the success rate of advancing novel compounds to viable drug candidates, early-stage evaluation and screening of potential drug-like molecules is a critical strategy [3].

Deep learning approaches have increasingly been employed to distinguish drug candidates from small molecules lacking drug-like properties, thereby minimizing unnecessary biological and clinical testing costs [4]. These algorithms leverage statistical models trained on large datasets to predict physicochemical properties and derive rules for drug-likeness. While physicochemical parameters provide an overall characterization of compounds, substructure-based analysis helps identify toxicity alerts by detecting unstable, reactive, or toxic moieties [5]. Furthermore, investigating binding affinities of compounds to their targets is one of the most crucial steps for pharmaceutical research, many types of frameworks have been utilized in compound-protein interaction (CPI) analysis [6]. Another practical problem in drug development is synthesizability, as certain structures may be difficult or infeasible to synthesize [7]. Deep learning can facilitate this process by predicting synthetic routes and assessing the feasibility of synthesis, thereby accelerating the transformation of compounds to drug candidates [8]. Given the complexity of drug behavior in vivo, a multidimensional evaluation is essential for improving the likelihood of successful clinical translation.

Despite these advancements, a versatile tool for evaluating and screening compounds across multidimensional is lacking, covering physicochemical properties, toxicity, binding affinity, and synthesizability of novel compounds. Specifically, several web tools integrated in silico predictive models have been developed to facilitate the identification one or more molecules with a specific set of properties, such as absorption, distribution, metabolism, excretion, and toxicity (ADMET) related properties [[9], [10], [11]]. Notable tools include ADMETlab [9], admetSAR [10], and SwissADME [11]. Among them, ADMETlab is a web server that provides a systematic evaluation of ADMET-related parameters along with physicochemical and medicinal chemistry properties [9]; admetSAR platform focuses on ADMET property assessment and optimization [10]; and SwissADME offers rapid predictive models for ADME-related parameters [11]. However, these tools primarily predict ADMET-related properties, underscoring the need for a more comprehensive platform for drug-likeness evaluation across multiple dimensions.

In this study, a versatile web tool druglikeFilter that facilitates comprehensive compound evaluation and automatic filtering was developed upon extensive prior research efforts. As illustrated in Fig. 1, druglikeFilter measures drug-likeness by 1) evaluating physicochemical rule by systematic determination, 2) investigating toxicity alert from multiple perspectives, 3) measuring binding affinity by dual-path analysis, and 4) assessing compound synthesizability by retro-route prediction. This versatile and open-source tool aims to accelerate the discovery of druggable molecules, and the relevant information can be fed back to optimize the AI model to further accelerate the development of new drug molecules. The druglikeFilter is accessible at https://idrblab.org/drugfilter/.

Fig. 1.

Fig. 1

The general workflow of druglikeFilter. It collectively measures the drug-likeness of compounds across four critical dimensions: physicochemical rule evaluated by systematic determination, toxicity alert investigated from multiple perspectives, binding affinity measured by dual-path analysis, and compound synthesizability assessed by retro-route prediction.

2. Experimental

2.1. Evaluating physicochemical property

Physicochemical properties provide an overarching description of ADMET-related parameters for compounds [12], which can be used to rapidly filter out non-drug-like small molecules to reduce unnecessary biological and clinical testing costs. The druglikeFilter reads all the input molecules in the form of simplified molecular input line entry system (SMILES) or structure-data file (SDF), and then calculates their 15 commonly physicochemical properties (molecular weight, hydrogen bond (H-bond) acceptors, H-bond donors, calculated octanol-water partition coefficient logP (ClogP), rotatable bonds, topological polar surface area (TPSA), molar refractivity, total atoms, carbon atoms, heteroatoms, heavy atoms, aromatic rings, aliphatic rings, double bonds, and triple bonds). These calculations are primarily based on RDKit [13] and Pybel [14], utilizing additional Python libraries such as Scipy [15], Numpy [16], and Scikit-learn [17] for enhanced accuracy and functionality. Throughout the history of drug development, medicinal chemists have established numerous physicochemical rules throughout drug development to streamline candidate selection. druglikeFilter integrates 12 practical rules (Table S1 [[18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30]]) from literatures, comprising 5 property-based rules and 7 substructure-based rules. These rules help eliminate non-druggable molecules, promiscuous compounds, and assay-interfering structures, improving the efficiency of early-stage drug screening.

2.2. Investigating toxicity alert

Molecular safety is a critical concern in drug development, often leading to costly failures in the drug discovery process [31]. In order to identify compounds with potential toxicity risks, a focus on structural alerts in compound structures is essential, as these structural fragments serve as functional units in drug design. druglikeFilter compiles approximately 600 toxicity alerts derived from relevant preclinical and clinical studies. These substructures are closely associated with types of toxicity, including acute toxicity (20 alerts) [32], skin sensitization (151 alerts) [5], genotoxic carcinogenicity (103 alerts) [5], non-genotoxic carcinogenicity (23 alerts) [33], and others [34]. Furthermore, druglikeFilter incorporates CardioTox net, a deep learning framework based on fully connected neural network and graph convolutional neural network, designed to predict cardiotoxicity with higher accuracy than conventional methods [35]. The model undergoes stepwise training to classify molecules as human ether-à-go-go-related gene (hERG) blockers or non-hERG blockers, using a probability threshold of ≥0.5 to indicate a potential hERG blockade risk, which is strongly associated with cardiac toxicity. By integrating comprehensive structural alerts and advanced predictive modeling, druglikeFilter provides valuable guidance for designing safer drug candidates, mitigating toxicity risks, and minimizing the costs associated with developing unsafe compounds.

2.3. Measuring binding affinity

Identifying interactions between chemical compounds and proteins is a critical step in evaluating target durability and screening potential candidates. The druglikeFilter measures CPIs using both structure-based (molecular docking) and sequence-based (AI model) approaches. The structure-based method leverages molecular docking, a computational technique that evaluates ligand binding to receptor proteins based on geometric and energetic factors [36]. The druglikeFilter incorporates AutoDock Vina, one of the fastest and most widely used open-source docking programs for this purpose [37]. During docking, the uploaded protein structure undergoes preprocessing, including cleaning, bond reconstruction, and hydrogen addition. The binding pocket is defined based on the original ligand or custom parameters, followed by force field optimization. However, for some drug targets, information on the protein structure is not available, especially for challenging drug targets [38]. In such cases, druglikeFilter employs a sequence-based AI model transformerCPI2.0 [39], to accurately predict the CPI using the protein sequence as input. This model utilizes a transformer encoder and a graph convolutional network to extract protein features and compound features, respectively. Then, an interaction decoder with self-attention mechanisms learns the interaction patterns, and a classifier predicts the probabilities of CPI. Finally, the compounds are sorted by docking score or prediction score, with the option for users to easily apply automated filtering by binding affinity, such as the top 10%. This dual-path approach (structure-based and sequence-based) allows druglikeFilter to handle a range of scenarios, enhancing its utility in drug discovery and evaluation.

2.4. Assessing compound synthesizability

Chemical synthesis is a prerequisite for experimental studies on biological activity, toxicity, and other drug-related properties, making it one of the key limiting steps in drug development [7]. AI-driven approaches can significantly accelerate this process by predicting synthetic accessibility and assisting in retrosynthetic planning [40]. The druglikeFilter firstly estimates the synthetic accessibility of compounds with RDKit, providing an overall assessment of synthesis feasibility. However, accessibility alone is insufficient for complex molecules requiring detailed synthetic route planning. The druglikeFilter tackles this challenge with retrosynthesis, which deconstructs complex molecules into simpler building blocks to identify viable synthetic pathways [41,42]. To efficiently determine high-quality synthetic pathways, druglikeFilter integrates Retro∗, a neural-based A∗-like algorithm designed to bridge the gap between computational algorithms and the complexity of chemical synthesis [43]. During retrosynthetic planning, an “AND-OR” search tree is used, with an iteration limit set to 200, ensuring a balance between computational efficiency and thorough exploration of synthetic options. This combination enables druglikeFilter to deliver accurate and practical synthesis predictions that align with the demands of chemical research.

3. Results and discussion

3.1. The web server implementation of druglikeFilter

The web server is built on a Django framework, with the front end developed using ElementUI and Vue. Furthermore, the front end handles queries, and ensures an intuitive user experience by dynamically communicating with the back end. As shown in Fig. S1, the website (https://idrblab.org/drugfilter/) is accessible to all users without requiring login credentials, and the platform has been rigorously tested for compatibility with major web browsers, including Mozilla Firefox, Google Chrome, Microsoft Edge, and Apple Safari.

3.2. The versatile applications of druglikeFilter

As illustrated in Fig. 1, druglikeFilter consists of four stages, each comprising two functional modules. This tool can process approximately 10,000 molecules simultaneously. For the uploaded molecular libraries, druglikeFilter calculates and provides detailed information for each of the compound followed by integration of physicochemical properties with drug-likeness rules to enable systematic and efficient profiling, along with automatic discarding of molecules with low drug potential and advancing promising candidates to the next stage. Subsequently, reliable toxicophore rules was employed along with an advanced AI model to investigate toxicity alerts from multiple perspectives, aiming to reduce toxicity risks during the early stages of drug discovery. To identify potential bioactive molecules, druglikeFilter evaluates the binding affinities of compounds to their targets from both sequence-based and structure-based perspectives. In the final stage, the tool provides a comprehensive assessment of synthetic difficulty along with retrosynthetic planning for the effective bridging of the gap between computational predictions and practical synthesis. By integrating these stages, druglikeFilter enables comprehensive property analysis, systematic drug-likeness evaluation, and automated filtering of high-risk compounds. Notably, these modules can be used independently or in combination, allowing flexible application to diverse research needs. Although AI-powered druglikeFilter streamlines drug screening by filtering non-drug-like molecules through a multidimensional evaluation framework, different screening strategies should be adapted to specific research objectives. As no single approach is universally optimal, druglikeFilter supports customizable screening schemes, further expanding its applicability. In summary, the deep learning-based tool druglikeFilter has significant potential to accelerate drug discovery.

3.3. Physicochemical property evaluated by systematic determination

Physicochemical properties play a critical role in determining molecular behavior in vivo, giving a global description of the ADMET-related parameters [12]. The druglikeFilter utilizes 15 commonly used physicochemical properties to identify one or more molecules with a specific set of properties of interest, by customizing the filtering scheme. Over time, medicinal chemists have developed multiple rule-based frameworks to efficiently exclude small molecules lacking drug-like features, thereby reducing unnecessary biological and clinical testing costs. The druglikeFilter leverages 12 drug-likeness rules (Fig. 2 and Table S1 [[18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30]]) to filter out compounds that lack drug-like potential. One of the most influential guidelines for drug-likeness is Lipinski's rule of five (Ro5), which indicates a higher likelihood of poor absorption or permeation if a compound meets two or more of the following criteria: molecular weight > 500, ClogP > 5, number of H-bond donors > 5, and number of H-bond acceptors > 10 [18]. For example, probucol, an approved cholesterol-lowering drug, has a molecular weight of 517 and a ClogP of 9.9, exceeding Ro5 thresholds. These physicochemical properties likely contribute to its poor and erratic gastrointestinal absorption (approximately 7%) in clinical use [44].

Fig. 2.

Fig. 2

Physicochemical rule evaluated by systematic determination. (A) 12 drug-likeness rules were used to exclude compounds lacking drug potential in the druglikeFilter. (B) 15 physicochemical properties for flexible measuring of the drug-likeness of compounds. GSK: GlaxoSmithKline; PAINS: pan assay interference compounds; BMS: Bristol Myers Squibb; ClogP: calculated octanol-water partition coefficient logP; TPSA: topological polar surface area; H-bond: hydrogen bond.

Moreover, several drug-likeness rules in druglikeFilter containing many substructures (Fig. 2 and Table S1 [[18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30]]), such as pan assay interference compounds (PAINS) [23], Bristol Myers Squibb (BMS) rule [24], and Brenk rule [25], can help to determine compounds prone to assay interference and potential promiscuity. These noisy compounds are prone to produce false-positive results, complicating decision-making for medicinal chemists. For example, the pyrazolidine-3,5-dione fragment in the structure of sulfinpyrazone is considered potentially risky (PAINS) by druglikeFilter (Fig. 2), emphasizing the need for cautious evaluation in drug design. The druglikeFilter combines physicochemical properties with drug-likeness rules to provide a simple, systematic, and efficient protocol for early-stage compound assessment. The tool eliminates low-potential drug candidates in the initial screening phase, increasing screening efficiency. The tool also supports customization with task-specific protocols and is designed to be user-friendly, enabling non-specialists to use.

3.4. Toxicity alert investigated from multiple perspectives

Drug withdrawals due to toxicity represent one of the costliest failures in drug development [31]. Therefore, identifying structural features that may lead to toxicity or adverse effects at therapeutic doses is a critical early-stage consideration in drug discovery. Over time, a wealth of knowledge has accumulated regarding structural fragments and associated toxicity data, with particular emphasis on structural alerts [5]. These alerts are linked to toxicity risks, and many of them exhibit high chemical reactivity or can be bioactivated by enzymes into reactive intermediates. To facilitate safer drug design, druglikeFilter leverages this knowledge to guide the identification of safer drug-like compounds, which collects approximately 600 toxicity alerts identified from relevant preclinical and clinical studies to comprehensively assess toxicity risks. These alerts cover multiple toxicity, including acute toxicity, cardiac toxicity, skin sensitization, genotoxic carcinogenicity, non-genotoxic carcinogenicity, and others [5,[32], [33], [34]]. For instance, the phenothiazine substructure may have the risk of acute toxicity, while the mechlorethamine substructure may be linked to non-genotoxic carcinogenicity (Fig. 3).

Fig. 3.

Fig. 3

Toxicity alert investigated from multiple perspectives. (A) Structural alerts and artificial intelligence (AI) model summarized from preclinical studies were used to identify compounds with toxicity risks. (B) Structural alerts derived from relevant clinical study utilize to address toxicity concerns.

In addition to leveraging toxicophore rules accumulated in previous studies, druglikeFilter also integrates an AI model for toxicity prediction [35]. One such model, the deep learning-based CardioTox net, serves as a robust tool to screen small molecules for potential hERG channel blockade, a key indicator of cardiotoxicity in drug discovery [35]. The druglikeFilter labels molecules with a probability of ≥0.5 as having cardiac toxicity risk (Fig. 3), and the higher star ratings indicate higher risk. As mentioned above, druglikeFilter combines reliable toxicophore rules with cutting-edge AI model to assess compound toxicity from multiple perspectives and exclude toxic compounds, and aims to increase the success rate of identifying viable drug candidates by minimizing toxicity risks in early stages of drug discovery.

3.5. Binding affinity measured by dual-path analysis

The biological functions of proteins are often modulated by ligand binding, making binding affinity prediction a critical task in drug discovery [39]. Molecular docking is a widely used approach for this purpose, as it estimates the binding position, orientation, and conformation of a ligand based on protein structure [36]. The druglikeFilter integrates AutoDock Vina, one of the fastest and most widely used open-source docking programs, to measure binding affinities of compounds and identify potential hits from chemical libraries [37]. For example, apoptosis signal-regulating kinase 1 (ASK1) plays a key role in inflammation and apoptosis, making its inhibitors promising therapeutic candidates for kidney and pulmonary diseases [45]. The inhibitors of ASK1 are promising therapeutic candidates for kidney and pulmonary diseases. As shown in Fig. 4, druglikeFilter facilitates the identification of high-affinity ASK1 inhibitors by evaluating CPI through molecular docking. Compounds are ranked based on docking scores, with higher star ratings indicating stronger binding affinity, thereby streamlining hit selection. Additionally, insights into the binding affinity and binding mode of these hits provide valuable guidance for further lead compound optimization.

Fig. 4.

Fig. 4

Binding affinity measured by dual-path analysis: binding affinity prediction based on the (A) protein structure and (B) the protein sequence. SMILES: simplified molecular input line entry system.

However, structural data are unavailable for some challenging drug targets [38]. Despite recent advances in protein structure prediction such as AlphaFold and RoseTTAFold have been successful, not all predicted structures are suitable for structure-based drug design (SBDD). In such cases, sequence-based deep learning models offer a viable alternative for binding affinity prediction [39]. The druglikeFilter incorporates transformer CPI2.0, a deep learning model that uses large-scale biological data mining and modeling to predict CPI based on protein sequences, effectively expanding binding affinity prediction to targets lacking structural data. For example, mismatch repair endonuclease PMS2 is an important target in cancer therapy, but the lack of available structures limits virtual screening [46]. The druglikeFilter uses PMS2's sequence to screen for potential hits from extensive chemical databases, identifying high affinity hits by prediction score ranking (Fig. 4). Furthermore, natural products and traditional medicines are often discovered for their observed therapeutic effects, but their mechanisms are not yet clear [47]. Conventional approaches rely on network pharmacology, molecular docking, and molecular dynamics simulations to predict their drug targets, but these methods typically rely on protein structures. The druglikeFilter overcomes this limitation by enabling target identification even in the absence of structural data, significantly enhancing target recognition and drug discovery. By combining structure-based and sequence-based analysis, druglikeFilter efficiently predicts CPIs and binding affinity. This dual-path approach effectively circumvents structural limitations, enabling a broader exploration of potential drug targets and the discovery of new active molecules.

3.6. Compound synthesizability assessed by retro-route prediction

Chemical synthesis is a prerequisite for experimental studies on biological activity, toxicity, and other drug-related properties, making it one of the rate-limiting steps in drug development [7]. Advances in AI-driven synthetic accessibility estimation and retrosynthetic planning have the potential to significantly accelerate drug discovery [40,48]. The druglikeFilter first assesses the synthetic difficulty of compounds by calculating the accessibility score (SAscore), which ranges from 1 to 10 [49]. A score of 6 or higher indicates a compound is challenging to synthesize, with higher star ratings meaning more challenging. In most cases, SAscore effectively distinguishes feasible molecules from infeasible ones, providing a valuable metric for assessing synthetic difficulty.

Beyond synthetic accessibility, the efficient and practical synthesis of valuable molecules requires well-designed retrosynthetic routes [8]. Retrosynthesis breaks down complex molecules into simpler building blocks, bridging the gap between computational algorithms and the complexities of chemical synthesis [41,42]. However, the vast number of possible chemical reactions creates a large search space, making retrosynthetic planning challenging even for experienced chemists [43]. To address this, druglikeFilter integrates Retro∗, which is a neural-based A∗-like algorithm that finds high-quality synthetic routes efficiently to accelerate the drug discovery process [43]. As illustrated in Fig. 5, druglikeFilter provides a comprehensive retrosynthetic planning for the compound (CCn1cc(Cl)c2cnc(NC(=O)c3ccc([C@@](C)(O)CO)c(C)c3)cc21). The retrosynthesis involves a 4-step reaction, producing 9 synthetic intermediates. It also provides detailed insights, including reaction feasibility for each step, route cost, and estimated time. By offering both an overall synthetic difficulty assessment and retrosynthetic planning, druglikeFilter plays a pivotal role in advancing compound research and streamlining the drug development process.

Fig. 5.

Fig. 5

Compound synthesizability assessed via the retro-route prediction. (A) Retrosynthetic route of the compound predicted by druglikeFilter. The synthetic intermediates and reaction feasibility for each step are highlighted. (B) Detail information about the retrosynthetic route. (C) The workflow of Retro∗ model for predicting the retrosynthetic routes in the druglikeFilter. SMILES: simplified molecular input line entry system.

4. Conclusions

Drug development is a complex process, where comprehensive evaluation and early filtering of potential drug-like molecules based on physicochemical properties, toxicity alert, target affinity, and synthesizability are crucial strategies for reducing costs and increasing the success rate of advancing novel compounds to viable drug candidates. In this study, we developed druglikeFilter, a deep learning-based tool for collectively measuring the drug-likeness of compounds across these four critical dimensions. First, druglikeFilter integrates physicochemical properties and drug-likeness rules to provide a systematic and efficient protocol for early-stage compound assessment. Second, it incorporates reliable toxicophore rules alongside advanced AI-driven toxicity model to conduct multi-perspective toxicity assessments, aiming to improve the identification of viable drug candidates by reducing toxicity risks. Third, its binding affinity evaluation leverages both sequence-based and structure-based predictions, enabling a comprehensive exploration of drug-target interactions and the identification of active molecules. Finally, druglikeFilter assesses synthetic feasibility by providing overall synthetic difficulty scores and retrosynthetic planning, bridging the gap between computational predictions and practical synthesis strategies. These four parts can be used individually or combined to apply to more tasks. By enabling comprehensive and automatic filtering of compound libraries to identify drug-like molecules, druglikeFilter not only streamlines the drug development process but also plays a crucial role in advancing research efforts towards viable drug candidates.

CRediT authorship contribution statement

Minjie Mou: Writing – original draft, Investigation, Conceptualization, Methodology. Yintao Zhang: Data curation, Software. Yuntao Qian: Software, Investigation. Zhimeng Zhou: Data curation, Methodology. Yang Liao: Software, Writing – review & editing. Tianle Niu: Investigation. Wei Hu: Validation. Yuanhao Chen: Software. Ruoyu Jiang: Writing – review & editing. Hongping Zhao: Validation. Haibin Dai: Writing – review & editing, Validation. Yang Zhang: Funding acquisition, Writing – original draft. Tingting Fu: Writing – original draft, Funding acquisition, Supervision.

Data availability

The druglikeFilter is a versatile deep learning-based tool for drug-likeness assessment, and can be freely accessible at: https://idrblab.org/drugfilter/.

Declaration of competing interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This work was financially supported by the National Natural Science Foundation of China (Grant No.: 82404511), Postdoctoral Fellowship Program of China Postdoctoral Science Foundation (CPSF) (Grant No.: GZC20232345), Priority-Funded Postdoctoral Research Project, Zhejiang Province, China (Grant No.: ZJ2024012), and Central Guidance on Local Science and Technology Development Fund of Hebei Province, China (Grant No.: 226Z2605G).

Footnotes

Peer review under responsibility of Xi'an Jiaotong University.

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.jpha.2025.101298.

Contributor Information

Haibin Dai, Email: haibindai@zju.edu.cn.

Yang Zhang, Email: zhangyang@hebmu.edu.cn.

Tingting Fu, Email: futt@zju.edu.cn.

Appendix A. Supplementary data

The following is the Supplementary data to this article:

Multimedia component 1
mmc1.docx (612KB, docx)

References

  • 1.Yang X., Wang Y., Byrne R., et al. Concepts of artificial intelligence for computer-assisted drug discovery. Chem. Rev. 2019;119:10520–10594. doi: 10.1021/acs.chemrev.8b00728. [DOI] [PubMed] [Google Scholar]
  • 2.Trajanoska K., Bhérer C., Taliun D., et al. From target discovery to clinical drug development with human genetics. Nature. 2023;620:737–745. doi: 10.1038/s41586-023-06388-8. [DOI] [PubMed] [Google Scholar]
  • 3.Gorgulla C., Boeszoermenyi A., Wang Z.F., et al. An open-source drug discovery platform enables ultra-large virtual screens. Nature. 2020;580:663–668. doi: 10.1038/s41586-020-2117-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Sadybekov A.V., Katritch V. Computational approaches streamlining drug discovery. Nature. 2023;616:673–685. doi: 10.1038/s41586-023-05905-z. [DOI] [PubMed] [Google Scholar]
  • 5.Sushko I., Salmina E., Potemkin V.A., et al. ToxAlerts: A Web server of structural alerts for toxic chemicals and compounds with potential adverse reactions. J. Chem. Inf. Model. 2012;52:2310–2316. doi: 10.1021/ci300245q. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Leeson P.D., Bento A.P., Gaulton A., et al. Target-based evaluation of “drug-like” properties and ligand efficiencies. J. Med. Chem. 2021;64:7210–7230. doi: 10.1021/acs.jmedchem.1c00416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Campos K.R., Coleman P.J., Alvarez J.C., et al. The importance of synthetic chemistry in the pharmaceutical industry. Science. 2019;363 doi: 10.1126/science.aat0805. [DOI] [PubMed] [Google Scholar]
  • 8.Wang Y., Pang C., Wang Y., et al. Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks. Nat. Commun. 2023;14 doi: 10.1038/s41467-023-41698-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Fu L., Shi S., Yi J., et al. ADMETlab 3.0: An updated comprehensive online ADMET prediction platform enhanced with broader coverage, improved performance, API functionality and decision support. Nucleic Acids Res. 2024;52:W422–W431. doi: 10.1093/nar/gkae236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Gu Y., Yu Z., Wang Y., et al. admetSAR3.0: A comprehensive platform for exploration, prediction and optimization of chemical ADMET properties. Nucleic Acids Res. 2024;52:W432–W438. doi: 10.1093/nar/gkae298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Daina A., Michielin O., Zoete V. SwissADME: A free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci. Rep. 2017;7 doi: 10.1038/srep42717. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bickerton G.R., Paolini G.V., Besnard J., et al. Quantifying the chemical beauty of drugs. Nat. Chem. 2012;4:90–98. doi: 10.1038/nchem.1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.RDKit: Open-source cheminformatics software. https://www.rdkit.org. (Accessed 1 March 2024).
  • 14.O’Boyle N.M., Morley C., Hutchison G.R. Pybel: A Python wrapper for the OpenBabel cheminformatics toolkit. Chem. Cent. J. 2008;2 doi: 10.1186/1752-153X-2-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Virtanen P., Gommers R., Oliphant T.E., et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods. 2020;17:261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Harris C.R., Millman K.J., van der Walt S.J., et al. Array programming with NumPy. Nature. 2020;585:357–362. doi: 10.1038/s41586-020-2649-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.F. Pedregosa, G. Varoquaux, A. Gramfort, et al., Scikit-learn: Machine learning in Python, arXiv. 2012. 10.48550/arXiv.1201.0490. [DOI]
  • 18.Lipinski C.A., Lombardo F., Dominy B.W., et al. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 2001;46:3–26. doi: 10.1016/s0169-409x(00)00129-0. [DOI] [PubMed] [Google Scholar]
  • 19.Congreve M., Carr R., Murray C., et al. A ‘rule of three’ for fragment-based lead discovery? Drug Discov. Today. 2003;8:876–877. doi: 10.1016/s1359-6446(03)02831-9. [DOI] [PubMed] [Google Scholar]
  • 20.Veber D.F., Johnson S.R., Cheng H.Y., et al. Molecular properties that influence the oral bioavailability of drug candidates. J. Med. Chem. 2002;45:2615–2623. doi: 10.1021/jm020017n. [DOI] [PubMed] [Google Scholar]
  • 21.Muegge I., Heald S.L., Brittelli D. Simple selection criteria for drug-like chemical matter. J. Med. Chem. 2001;44:1841–1846. doi: 10.1021/jm015507e. [DOI] [PubMed] [Google Scholar]
  • 22.Ghose A.K., Viswanadhan V.N., Wendoloski J.J. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J. Comb. Chem. 1999;1:55–68. doi: 10.1021/cc9800071. [DOI] [PubMed] [Google Scholar]
  • 23.Baell J.B., Holloway G.A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J. Med. Chem. 2010;53:2719–2740. doi: 10.1021/jm901137j. [DOI] [PubMed] [Google Scholar]
  • 24.Pearce B.C., Sofia M.J., Good A.C., et al. An empirical process for the design of high-throughput screening deck filters. J. Chem. Inf. Model. 2006;46:1060–1068. doi: 10.1021/ci050504m. [DOI] [PubMed] [Google Scholar]
  • 25.Brenk R., Schipani A., James D., et al. Lessons learnt from assembling screening libraries for drug discovery for neglected diseases. ChemMedChem. 2008;3:435–444. doi: 10.1002/cmdc.200700139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Huth J.R., Mendoza R., Olejniczak E.T., et al. ALARM NMR: A rapid and robust experimental method to detect reactive false positives in biochemical screens. J. Am. Chem. Soc. 2005;127:217–224. doi: 10.1021/ja0455547. [DOI] [PubMed] [Google Scholar]
  • 27.Metz J.T., Huth J.R., Hajduk P.J. Enhancement of chemical rules for predicting compound reactivity towards protein thiol groups. J. Comput. Aided Mol. Des. 2007;21:139–144. doi: 10.1007/s10822-007-9109-z. [DOI] [PubMed] [Google Scholar]
  • 28.Schorpp K., Rothenaigner I., Salmina E., et al. Identification of small-molecule frequent hitters from AlphaScreen high-throughput screens. J. Biomol. Screen. 2014;19:715–726. doi: 10.1177/1087057113516861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Brenke J.K., Salmina E.S., Ringelstetter L., et al. Identification of small-molecule frequent hitters of glutathione S-transferase-glutathione interaction. J. Biomol. Screen. 2016;21:596–607. doi: 10.1177/1087057116639992. [DOI] [PubMed] [Google Scholar]
  • 30.Chakravorty S.J., Chan J., Greenwood M.N., et al. Nuisance compounds, PAINS filters, and dark chemical matter in the GSK HTS collection. SLAS Discov. 2018;23:532–545. doi: 10.1177/2472555218768497. [DOI] [PubMed] [Google Scholar]
  • 31.Wu L., Yan B., Han J., et al. TOXRIC: A comprehensive database of toxicological data and benchmarks. Nucleic Acids Res. 2023;51:D1432–D1445. doi: 10.1093/nar/gkac1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Tinkov O.V., Grigorev V.Y., Polishchuk P.G., et al. QSAR investigation of acute toxicity of organic compounds during oral administration to mice. Biomed. Khim. 2019;65:123–132. doi: 10.18097/PBMC20196502123. [DOI] [PubMed] [Google Scholar]
  • 33.Benigni R., Bossa C. Mechanisms of chemical carcinogenicity and mutagenicity: A review with implications for predictive toxicology. Chem. Rev. 2011;111:2507–2536. doi: 10.1021/cr100222q. [DOI] [PubMed] [Google Scholar]
  • 34.Nascimento C.M.C., Moura P.G., Pimentel A.S. Generating structural alerts from toxicology datasets using the local interpretable model-agnostic explanations method. Digit. Discov. 2023;2:1311–1325. [Google Scholar]
  • 35.Karim A., Lee M., Balle T., et al. CardioTox net: A robust predictor for hERG channel blockade based on deep learning meta-feature ensembles. J. Cheminform. 2021;13 doi: 10.1186/s13321-021-00541-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Forli S., Huey R., Pique M.E., et al. Computational protein-ligand docking and virtual drug screening with the AutoDock suite. Nat. Protoc. 2016;11:905–919. doi: 10.1038/nprot.2016.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Eberhardt J., Santos-Martins D., Tillack A.F., et al. AutoDock vina 1.2.0: New docking methods, expanded force field, and Python bindings. J. Chem. Inf. Model. 2021;61:3891–3898. doi: 10.1021/acs.jcim.1c00203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Chenthamarakshan V., Hoffman S.C., Owen C.D., et al. Accelerating drug target inhibitor discovery with a deep generative foundation model. Sci. Adv. 2023;9 doi: 10.1126/sciadv.adg7865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Chen L., Fan Z., Chang J., et al. Sequence-based drug design as a concept in computational drug design. Nat. Commun. 2023;14 doi: 10.1038/s41467-023-39856-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lin K., Xu Y., Pei J., et al. Automatic retrosynthetic route planning using template-free models. Chem. Sci. 2020;11:3355–3364. doi: 10.1039/c9sc03666k. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Yu S., Li H., Zhang P., et al. Site-specific template generative approach for retrosynthetic planning. Nat. Commun. 2024;15 doi: 10.1038/s41467-024-52048-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Strieth-Kalthoff F., Szymkuć S., Molga K., et al. Artificial intelligence for retrosynthetic planning needs both data and expert knowledge. J. Am. Chem. Soc. 2024;146:11005–11017. doi: 10.1021/jacs.4c00338. [DOI] [PubMed] [Google Scholar]
  • 43.B.H. Chen, C.T. Li, H.J. Dai, et al., Retro*: Learning retrosynthetic planning with neural guided A* search, arXiv. 2020. 10.48550/arXiv.2006.15820. [DOI]
  • 44.Knox C., Wilson M., Klinger C.M., et al. DrugBank 6.0: The DrugBank knowledgebase for 2024. Nucleic Acids Res. 2024;52:D1265–D1275. doi: 10.1093/nar/gkad976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Jones J.H., Xin Z., Himmelbauer M., et al. Discovery of potent, selective, and brain-penetrant apoptosis signal-regulating kinase 1 (ASK1) inhibitors that modulate brain inflammation in vivo. J. Med. Chem. 2021;64:15402–15419. doi: 10.1021/acs.jmedchem.1c01458. [DOI] [PubMed] [Google Scholar]
  • 46.Chen J., Hu C., Yang H., et al. PMS2 amplification contributes brain metastasis from lung cancer. Biol. Proced. Online. 2024;26 doi: 10.1186/s12575-024-00238-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Jiang M., Yan L., Li M., et al. Computer-aided investigation of traditional Chinese medicine mechanisms: A case study of San-ao decoction in asthma treatment. Comput. Biol. Med. 2024;169 doi: 10.1016/j.compbiomed.2023.107868. [DOI] [PubMed] [Google Scholar]
  • 48.Blakemore D.C., Castro L., Churcher I., et al. Organic synthesis provides opportunities to transform drug discovery. Nat. Chem. 2018;10:383–394. doi: 10.1038/s41557-018-0021-z. [DOI] [PubMed] [Google Scholar]
  • 49.Ertl P., Schuffenhauer A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 2009;1 doi: 10.1186/1758-2946-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1
mmc1.docx (612KB, docx)

Data Availability Statement

The druglikeFilter is a versatile deep learning-based tool for drug-likeness assessment, and can be freely accessible at: https://idrblab.org/drugfilter/.


Articles from Journal of Pharmaceutical Analysis are provided here courtesy of Xi'an Jiaotong University

RESOURCES