Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2024 May 23;52(W1):W439–W449. doi: 10.1093/nar/gkae424

ChemFH: an integrated tool for screening frequent false positives in chemical biology and drug discovery

Shaohua Shi 1,2,4, Li Fu 3,4, Jiacai Yi 4,4, Ziyi Yang 5, Xiaochen Zhang 6, Youchao Deng 7, Wenxuan Wang 8, Chengkun Wu 9, Wentao Zhao 10, Tingjun Hou 11, Xiangxiang Zeng 12,, Aiping Lyu 13,, Dongsheng Cao 14,
PMCID: PMC11223804  PMID: 38783035

Abstract

High-throughput screening rapidly tests an extensive array of chemical compounds to identify hit compounds for specific biological targets in drug discovery. However, false-positive results disrupt hit compound screening, leading to wastage of time and resources. To address this, we propose ChemFH, an integrated online platform facilitating rapid virtual evaluation of potential false positives, including colloidal aggregators, spectroscopic interference compounds, firefly luciferase inhibitors, chemical reactive compounds, promiscuous compounds, and other assay interferences. By leveraging a dataset containing 823 391 compounds, we constructed high-quality prediction models using multi-task directed message-passing network (DMPNN) architectures combining uncertainty estimation, yielding an average AUC value of 0.91. Furthermore, ChemFH incorporated 1441 representative alert substructures derived from the collected data and ten commonly used frequent hitter screening rules. ChemFH was validated with an external set of 75 compounds. Subsequently, the virtual screening capability of ChemFH was successfully confirmed through its application to five virtual screening libraries. Furthermore, ChemFH underwent additional validation on two natural products and FDA-approved drugs, yielding reliable and accurate results. ChemFH is a comprehensive, reliable, and computationally efficient screening pipeline that facilitates the identification of true positive results in assays, contributing to enhanced efficiency and success rates in drug discovery. ChemFH is freely available via https://chemfh.scbdd.com/.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

The process of developing a new drug can usually be both time-consuming and costly. To enhance efficiency, high throughput screening (HTS) and virtual screening (VS) technologies are being assigned more prominent roles in the initial stages of drug discovery. In practical applications, true positive compounds identified by HTS typically constitute only 0.01–0.1% of the total samples in the screening databases (1). Moreover, for certain screening systems, over 95% of positive results are attributed to false positives or unexpected outcomes derived from the shared physicochemical properties of compounds or interfering factors (1,2). Such compounds frequently appear in different types of HTS and are referred to as frequent hitters (FHs) (3).

Common interference mechanisms in FH include colloidal aggregation, disruption of spectrographic detection methods (e.g. autofluorescence and luciferase inhibition), and chemical reactive interference (3). Several studies have reported the presence of a substantial number of interfering compounds within both molecular libraries and the results obtained from HTS. These studies cover assay interference mechanisms such as aggregation, fluorescence, firefly luciferase (FLuc) reporter enzyme inhibition (4–6). In response to the significant challenge posed by FH resulting from diverse interference mechanisms, Baell published a commentary article titled ‘Chemistry: Chemical con artists foil drug discovery’ in Nature in 2014 (7). The article underscored the impact of assay interferent compounds, aptly referred to as chemical con artists, in impeding the drug development process, leading to a substantial waste of research time and resources.

In 2017, nine editors-in-chief of American Chemical Society journals further emphasized the harm caused by false-positive compounds resulting from assay interference in a paper titled ‘The Ecstasy and Agony of Assay Interference Compounds.’ (8) The paper advised researchers to remain vigilant against potential false positives and emphasized the need to confirm the authenticity of positive screening results. Recognizing and addressing the frequent occurrences of false positives in high-throughput screening is crucial for reducing ineffective investments, improving screening hit rates, and enhancing drug development efficiency.

To mitigate attrition caused by false positive results, experimental detection techniques have been devised based on different characteristics of interfering compounds, such as adding nonionic detergents, using scavenging reagents, performing orthogonal approaches, or counter-screen assays. However, implementing these methods requires specific experimental conditions that are expensive and time-consuming. Consequently, a more effective strategy involves leveraging computational prediction tools as an initial step for FH detection before the primary screening. Several such screening rules and prediction models have been constructed for public applications, including pan-assay interference compounds (PAINS), Aggregator Advisor, Luciferase Advisor, ALARM NMR, FAF-Drugs4, Badapple, Hit Dexter 3.0, Lilly-MedChem, ChemAGG, ChemFLuc, ChemFluo, among others (9–17). PAINS, among the various tools, is a representative filtering approach with 480 structural alerts designed to detect false-positive compounds (13). However, despite widespread use in drug design, these screening tools often suffer from limitations such as small training datasets, ambiguous substructure screening endpoints, the lacking of advanced modeling methods, and insufficient false hit detection mechanisms, substantially compromising their efficiency and utility. For instance, the specific endpoints underlying PAINS rules remain unknown, posing a considerable challenge to result validation and subsequent processing strategies (18–21). Therefore, there is a compelling necessity to develop high-accuracy FH prediction models or tools in the early stages of drug discovery.

In this context, we introduce ChemFH, the first integrated online platform for comprehensive potential FH prediction and screening. The database, compiled through literature collection and database mining (3,10,11,22), consisted of 823 391 compounds, covering various assay interferences, including colloidal aggregators, FLuc inhibitors, fluorescent compounds, chemical reactive compounds, and promiscuous compounds. Based on this extensive database, we developed FH prediction models and representative substructure rules with clearly defined FH mechanisms. To construct credible FH model predictions, we employed Chemprop package (23), featured by a directed message passing neural network (DMPNN) structure. In addition, 102 representative substructure rules with high detection precision were summarized as a supplementary tool for prediction models. Ten commonly used FH screening rules were also included in ChemFH for more comprehensive FH detection. Subsequently, we conducted extensive FH screenings across five virtual screening databases to validate ChemFH, examine FH distribution with different interference mechanisms, and offer insights on using these databases. Furthermore, ChemFH was applied to two representative chemicals, curcumin and chaetocin, to exemplify its utility and demonstrate its reliability. The comparison was made between ChemFH and several other well-constructed FH prediction platforms. It is believed that with the rational application of ChemFH, the platform has the potential to substantially triage compound interference, thereby enhancing efficiency and success rate at the early stage of drug discovery. The ChemFH platform is accessible at https://chemfh.scbdd.com/.

Materials and methods

Data collection

We conducted a thorough review of the relevant literature and databases, including ZINC, ChEMBL, BindingDB and PubChem Bioassay (24–27), to compile a comprehensive dataset of FH. In addition, we also collected a significant number of non-assay interference as negative sets. To guarantee the quality of the benchmark dataset, all molecules underwent a rigorous multi-step data preparation process. First, salts and compounds lacking structures were eliminated. Then, all compounds underwent standardization at pH 7.0 using the ‘wash’ function of the Molecular Operating Environment software (MOE 2022). This process aimed to eliminate minor components, deprotonate strong acids, protonate strong bases, and add explicit hydrogens for uniformity. Lastly, duplicated molecules were removed, and compounds present in both positive and negative datasets were excluded. Thus, a final dataset comprising 823 391 compounds was assembled for model development and substructure construction (see Supplementary Data S1).

The prediction performance of models and generated substructure rules are usually significantly influenced by the structural diversity and distribution of chemical space in datasets (24). Therefore, we conducted a Murcko scaffold analysis to investigate the molecular diversity and chemical space of the initial datasets (see Supplementary Table S1 for details). Our analysis revealed abundant and diverse Murcko scaffolds across different datasets, with over 85% of scaffolds matching fewer than five molecules. The average number of corresponding molecules per scaffold for the whole dataset was below three, indicating a high molecular diversity and broader coverage of chemical space, which ensured the accuracy and robustness of ChemFH for FH prediction.

Model construction

In the framework of ChemFH, we utilized DMPNN, a subclass of graph convolutional neural network, implemented through Chemprop, a package centered on DMPNN. DMPNN learns molecular encodings using bond-centered convolutions, avoiding unnecessary loops during message passing. The DMPNN method and its derivatives have demonstrated successful applications across diverse drug discovery domains, particularly in antibiotics discovery (28–30). Multi-task DMPNN typically outperforms single-task DMPNN due to its ability to leverage shared information across multiple tasks, resulting in improved model performance. Additionally, multi-task models have an advantage over single-task models in terms of runtime, which has been proved in many studies (31,32). In this work, we employed Chemprop's multi-task method to train all endpoint tasks for ChemFH simultaneously. Recent studies showed that combining DMPNN with external features enhances performance, benefiting from the global information of descriptors and local information from DMPNN (33,34). Hence, in addition to the naïve DMPNN model, we constructed two more models by incorporating additional features into DMPNN: DMPNN combined with RDKit 2D descriptors (hereinafter referred to as DMPNN-Des) and DMPNN combined with Morgan fingerprint (hereinafter referred to as DMPNN-FP). The Adam optimizer method (35) was applied for training the models, with Bayesian optimization adopted for hyperparameter optimization. The comprehensive details of the optimal hyperparameters for DMPNN, DMPNN-Des, and DMPNN-FP models are presented in Supplementary Table S2. Metrics such as area under the curve (AUC), accuracy (ACC), balanced accuracy (BA), specificity (SP), sensitivity (SE) and Matthews correlation coefficient (MCC) were employed to assess the performance of the models.

Substructure rule collection

The conventional evaluation parameters of Quantitative Structure-Activity Relationship (QSAR) models are designed to recognize broad relationships rather than specific substructural effects, leading to information loss. To address this issue, representative substructures can be derived and utilized as complementary tools for prediction models. This approach aims to enhance interpretability and provide a more comprehensive perspective for FH detection (36). Using our group's developed automatic structure derivation tool, PySmash (https://github.com/kotori-y/pySmash) (37), on the collected datasets, we identified and summarized 102 representative alert substructures with an average precision score exceeding 0.7 for FH structural screening (refer to Supplementary Table S3 and S4). For more comprehensive FH detection, ChemFH also provided ten commonly used FH screening rules, including PAINS, BMS, GST/GSH FH filter, His-tagged protein FH filter, ALARM NMR, Luciferase inhibitor Rule, Chelator Rule, NTD, potential electrophilic rule, and Lilly Medchem rules for screening undesirable substructures (refer to Supplementary Table S5) (13,14,36,38–45). While the derived substructures from large datasets showed high precision, their utility and accuracy in screening rely on the chemical space of specific databases (36). Relying solely on substructure rules is generally unreliable; hence, it is advisable to use them cautiously and as supplementary tools for prediction models.

Uncertainty estimation

Uncertainty estimation in predictive models is vital for assessing confidence, supporting informed decision-making, managing risks, and prioritizing experiments. It directly influences the practical utility and reliability of the model in real-world applications. Lower uncertainty indicates increased confidence, while higher uncertainty suggests a less reliable prediction for a specific molecule. In the ChemFH model, uncertainty is estimated using the Monte Carlo dropout approach (46). During training, dropout is applied before each layer, and its activation is maintained during inference, enabling the generation of prediction distributions with different random masks. This approximation of the posterior of deep Gaussian processes allows the variance of the distribution to estimate predictive uncertainty (47). In this study, we applied the maximized Youden's index as the threshold for categorizing uncertainty estimation results (48): prediction uncertainties exceeding the threshold are labeled as ‘Low-confidence,’ whereas prediction uncertainties below this threshold are categorized as ‘High-confidence.’ The optimal uncertainty thresholds determined by maximum Yuden's index for different FH mechanism are listed in Supplementary Table S6.

Results

Webserver development

The ChemFH web server was built using Django in Python 3.9 and was hosted on a high-performance Nginx web server running Alibaba Cloud Linux 3. Uwsgi served as the intermediary between Django and Nginx. The implementation of ChemFH followed the Model-Template-View (MTV) design pattern, consisting of three layers: the model layer, the view layer, and the template layer. The model layer interacts with Sqlite3 database, which was employed for storing uploaded files, constructing models, and predicting properties. The view layer contains the primary logic code, facilitating access to prediction models, managing multi-prediction tasks, and handling file uploads and downloads. The template layer presents the front-end pages, including result visualization, page rendering, document integration, and more. In addition, the cheminformatics toolkit RDKit was used for the preprocessing and format conversion of molecular structures in ChemFH. Chemprop package was utilized for building and deploying graph deep learning models. A summarized list of the development environment for ChemFH can be found in Supplementary Table S7. The website has been tested thoroughly to ensure its functionality across multiple operating systems and web browsers.

ChemFH webserver workflow

ChemFH offers a user-friendly web interface for FH evaluation, featuring two aspects: model prediction and substructure alert. With over three years of operation, the server has received over 400 thousand global views, confirming its widespread utility. ChemFH can predict around 2500 molecules per minute, with variations based on settings. The workflow of ChemFH is presented in Figure 1.

Figure 1.

Figure 1.

The workflow of ChemFH webserver.

Input

To evaluate single or in-batch molecules, two input options were provided for FH evaluation based on the number of query molecules: Evaluation Mode and Screening Mode.

The Evaluation Mode provides virtual evidence for authenticating positive results in a biological campaign, allowing submission through editing a single SMILES string or drawing the molecule. In contrast, the Screening Mode is more suitable for detecting potential interference before assay design, with file uploading in formats like .sdf/.csv/.txt, allowing for multiple molecules.

Output

The output panel in FH Evaluation mode comprises three modules: Visualization, ‘Frequent Hitter Mechanisms’ and ‘Frequent Hitter Rules’. The Visualization module displays the prediction overview, including a 2D structure graph of the query molecule and a radar chart for model prediction results. The ‘Frequent Hitter Mechanisms’ module contains the results from the seven high-performance prediction models corresponding to the seven FH mechanisms and dataset-derived substructure rules. A query molecule is labeled as ‘Reject’ (FH compound) or ‘Pass’ (Non-FH compound) with uncertainty estimation denoted as ‘Low-confidence’ or ‘High-confidence’. Along with them, the number of the derived alert substructures and their highlights are also presented. The ‘Frequent Hitter Rules’ module includes the results from 10 commonly used FH screening rules (PAINS, BMS, GST/GSH FH filter, His-tagged protein FH filter, ALARM NMR, Luciferase inhibitor Rule, Chelator Rule, NTD, Potential electrophilic Rule, and Lilly Medchem Rules) for additional analysis. The results are presented in the number of alert substructures and their highlights. All the above-mentioned information can be downloaded from the page in .csv format. Users can also download a PDF report of the calculation results from the webpage. In addition to the information displayed on the webpage, the report includes detailed explanations of each FH mechanism and introduces the ten commonly used FH screening rules, offering users a comprehensive and easily understandable report for the query molecule.

In FH Screening mode, the results, including molecule graphs and SMILES of each molecule, are presented in a list. Users can view detailed evaluations for each molecule by clicking ‘View’ in the ‘Detail’ column. The overall results can be saved as a .csv file for further analysis.

Application Programming Interface

The Application Programming Interface (API) introduced in ChemFH facilitates efficient command-line access for researchers and is beneficial for handling extensive datasets. This accessibility is achieved through well-established protocols compatible with popular programming languages, simplifying interactions with the web server. Users can conveniently retrieve comprehensive results using a simple script, and detailed code examples are available in the ‘API Tutorial’ section on the website. ChemFH’s API receives SMILES strings and returns FH prediction results for seven mechanisms and ten FH filter rules. The highlighted substructures from FH rules screening are also returned in the data. Notably, numeric scores of uncertainty estimation are exclusively available in API functionality. The API’s flexibility encourages researchers to use its functionality for diverse applications, such as repositories, graphic user interfaces, and web applications for FH evaluation.

Performance of FH prediction models

In this study, we used Chemprop to construct three models for training FH prediction models: DMPNN, DMPNN-Des and DMPNN-FP. The training was based on a substantial collection of molecular graphs for each FH mechanism. To ensure the generalizability of prediction models, we partitioned the collected datasets into training, validation and test sets at a ratio of 8:1:1, respectively. To obtain a stable prediction performance result, we repeated the splitting process ten times before training to calculate the standard deviation of each statistic. The prediction performance for different FH mechanisms with different models is summarized in Table 1.

Table 1.

Performance of DMPNN, DMPNN-FP and MPNN-Des on the test set

Dataset Model AUC ACC BA SP SE MCC
Colloidal aggregators DMPNN 0.933 ± 0.003 0.882 ± 0.004 0.861 ± 0.005 0.921 ± 0.013 0.801 ± 0.019 0.729 ± 0.009
DMPNN-FP 0.934 ± 0.003 0.886 ± 0.004 0.862 ± 0.004 0.929 ± 0.013 0.796 ± 0.019 0.737 ± 0.007
DMPNN-Des 0.938 ± 0.004 0.891 ± 0.004 0.871 ± 0.005 0.928 ± 0.015 0.813 ± 0.022 0.749 ± 0.006
FLuc inhibitors DMPNN 0.979 ± 0.002 0.971 ± 0.001 0.901 ± 0.009 0.986 ± 0.002 0.817 ± 0.019 0.820 ± 0.007
DMPNN-FP 0.978 ± 0.003 0.971 ± 0.002 0.906 ± 0.008 0.986 ± 0.001 0.827 ± 0.016 0.823 ± 0.011
DMPNN-Des 0.979 ± 0.002 0.971 ± 0.001 0.908 ± 0.006 0.985 ± 0.002 0.831 ± 0.013 0.824 ± 0.008
Blue fluorescent compounds DMPNN 0.946 ± 0.005 0.953 ± 0.004 0.864 ± 0.014 0.979 ± 0.005 0.748 ± 0.03 0.757 ± 0.019
DMPNN-FP 0.941 ± 0.007 0.952 ± 0.003 0.865 ± 0.012 0.977 ± 0.004 0.753 ± 0.026 0.752 ± 0.016
DMPNN-Des 0.947 ± 0.004 0.953 ± 0.003 0.874 ± 0.008 0.976 ± 0.005 0.773 ± 0.019 0.761 ± 0.015
Green fluorescent compounds DMPNN 0.727 ± 0.008 0.744 ± 0.042 0.661 ± 0.006 0.790 ± 0.064 0.532 ± 0.064 0.283 ± 0.026
DMPNN-FP 0.722 ± 0.006 0.747 ± 0.040 0.654 ± 0.006 0.799 ± 0.062 0.509 ± 0.064 0.275 ± 0.021
DMPNN-Des 0.729 ± 0.007 0.751 ± 0.020 0.659 ± 0.008 0.801 ± 0.033 0.517 ± 0.041 0.281 ± 0.014
Reactive compounds DMPNN 0.991 ± 0.002 0.973 ± 0.004 0.973 ± 0.004 0.974 ± 0.007 0.972 ± 0.008 0.946 ± 0.008
DMPNN-FP 0.993 ± 0.002 0.971 ± 0.003 0.970 ± 0.003 0.977 ± 0.006 0.964 ± 0.008 0.941 ± 0.006
DMPNN-Des 0.992 ± 0.003 0.972 ± 0.005 0.972 ± 0.006 0.976 ± 0.005 0.967 ± 0.01 0.944 ± 0.01
Promiscuous compounds DMPNN 0.937 ± 0.005 0.882 ± 0.006 0.880 ± 0.006 0.89 ± 0.015 0.871 ± 0.017 0.761 ± 0.012
DMPNN-FP 0.935 ± 0.006 0.875 ± 0.009 0.874 ± 0.008 0.881 ± 0.02 0.867 ± 0.014 0.748 ± 0.018
DMPNN-Des 0.936 ± 0.007 0.878 ± 0.008 0.877 ± 0.007 0.892 ± 0.019 0.862 ± 0.013 0.754 ± 0.016
Assay interference DMPNN 0.846 ± 0.027 0.764 ± 0.046 0.774 ± 0.035 0.717 ± 0.102 0.831 ± 0.044 0.545 ± 0.065
DMPNN-FP 0.832 ± 0.014 0.750 ± 0.021 0.764 ± 0.015 0.684 ± 0.06 0.845 ± 0.048 0.524 ± 0.028
DMPNN-Des 0.844 ± 0.027 0.770 ± 0.037 0.777 ± 0.032 0.736 ± 0.066 0.818 ± 0.036 0.548 ± 0.063

As shown in Table 1, it is evident that all the prediction models exhibited high detection ability for FH, with average AUC and accuracy values of 0.91 and 0.88, respectively. Overall, the three types of models exhibit comparable performance, with the DMPNN-Des models showing better performance in five out of the seven FH mechanisms. Based on the above performance, ChemFH adopted DMPNN-Des models as the final choice for optimal prediction models.

To further validate the ChemFH platform's capability in identifying compounds with various interference mechanisms, we performed an FH evaluation using our web server on an external dataset with 75 assay compounds sourced from literature reports representing seven different mechanisms (6,49–53). From the results illustrated in Figure 2, it can be observed that models for each mechanism can accurately identify molecules that match their experimental interference mechanisms. Out of the 75 compounds, only two reactive compounds (No. 3 and No. 8) and two promiscuous compounds (No. 48 and No. 49) were misclassified by the model. Further examination of Figure 2 revealed that these molecules were often associated with multiple mechanisms simultaneously rather than exclusively related to a single mechanism. This indicated a notable prevalence of shared characteristics or interconnections among the identified mechanisms. For instance, compound 67, as a fluorescent enzyme inhibitor, may also be a green fluorescent compound, a blue fluorescent compound, or a colloidal aggregate compound. The simultaneous occurrence of multiple interfering mechanisms emphasizes the need for researchers to raise caution when selecting compounds.

Figure 2.

Figure 2.

A heatmap depicting ChemFH predictions for 75 compounds with distinct interfering mechanisms. Each column denotes an individual molecule, and each row depicts predictions for a specific mechanism. White cells indicate negative predictions, while colored cells represent positive model predictions. Varied colors delineate molecules associated with different mechanisms. The presentation aligns with the conventions of a scientific paper's figure description.

Application examples

This section presented an in-depth exploration by providing a practical database screening application example and two representative compound evaluation instances. These illustrative examples aimed to underscore the robustness, validity and practical utility of ChemFH in real-world scenarios.

Large database screening

Commercially available chemical libraries provide an efficient and cost-effective way to explore the chemical space of drug-like compounds. To comprehensively understand the proportion and distribution of FHs within these libraries, we screened seven mechanisms using ChemFH on five widely utilized virtual screening libraries, each containing a substantial number of compounds ranging from 500 000 to 1 800 000. The tested libraries include Asinex (522 390 compounds), Chembridge (1 557 938 compounds), ChemDiv (1 418 192 compounds), COCONUT (1 779 483 compounds) and Life Chemicals (509 974 compounds).

Figure 3 depicts the overall distribution of prediction results for the seven FH mechanisms across four commercial databases and one publicly available natural products database shared similar patterns. Among these five databases, colloidal aggregators constituted the predominant number of positive predictions. This aligns with the well-established recognition of colloidal aggregation as a major contributor to false positive results in HTS (2,54). Following were blue/green fluorescent compounds, Fluc inhibitors, and other interfering compounds, with promiscuous and reactive compounds accounting for the smallest proportion. Notably, Life Chemicals, ChemDiv, and COCONUT display a roughly 10% prevalence of colloidal aggregators—a finding consistent with findings from prior studies (55). In contrast, across all five libraries, prominently led by COCONUT at nearly 0.2%, the percentage of predicted high-score promiscuous compounds consistently remained low, indicating the scarcity of genuinely potent polypharmacology compounds.

Figure 3.

Figure 3.

The distribution of the results from five large database screenings on seven mechanisms. In the figure, ‘Agg’ refers to colloidal aggregators, ‘Blue’ refers to blue fluorescence, ‘Green’ refers to green fluorescence, ‘Rea.’ Refers to reactive compounds ‘Pro.’ refers to promiscuity, Fluc refers to Fluc enzyme inhibitors, and ‘Others’ refers to other assay interferences.

To gain a more profound comprehension of the structural characteristics of these FHs, we conducted an analysis using the ChemDiv database as a case study. We examined the occurrence frequencies of compounds predicted as positive in various mechanisms and checked the top four FH alert substructures. The results are illustrated in Supplementary Figure S1. In chemical reactive compounds, the Quinone_A substructure exhibited the highest occurrence, which has been proven to be a representative Redox-active substructure (56). Colloidal aggregation compounds, which tend to self-aggregate in solution, typically possess more lipophilic moieties (10), aligning with the characteristics of the depicted substructures. Regarding promiscuous compounds, the analysis revealed that the substructures Ene_rhod_A, Ene_six_het_A, Imine_one_A, and Anil_di_alk_D were the most frequently occurring. Consistent with findings by Lewis et al. (57), these identified substructures align with the characteristic features commonly associated with promiscuous data substructures. As for fluorescence enzyme inhibitors, two recurring substructures, Hzone_phenol_A and Hzone_phenol_B, were identified as prevalent motifs among these inhibitors (21). Generally, fluorescent compounds must meet two prerequisites: strong absorption within the visible light spectrum and effective fluorescence emission. Notably, some of the larger conjugated rings served as indicative features associated with fluorescent compounds, as shown in the figure (21). For compounds associated with other interfering mechanisms, the representative substructure aligns with the findings reported by Yang et al. (21).

Evaluation of curcumin and chaetocin

In this section, two representative FH natural products, curcumin and chaetocin, known for their potential therapeutic activities and false positive characteristics, are used to illustrate the utility and reliability of ChemFH. Curcumin is a pigment derived from turmeric (Curcuma longa) that has been widely reported for its purported medicinal properties. Biomedical exploration of curcumin has attracted significant attention, with over 120 clinical trials and more than $150 million in associated funding to date (58). Despite these efforts, the success rate of double-blinded, placebo-controlled clinical trials for curcumin has been notably low, prompting the hypothesis that this polyphenolic natural product may yield false signals (59). Thus, ChemFH was employed to gain insights into the potential FH characteristics of curcumin.

As shown in Supplementary Figure S2A, among the seven FH mechanisms, curcumin was predicted to be a colloidal aggregator, a reactive compound, and a FLuc inhibitor. Such results are not surprising since experiments have confirmed that curcumin is a strong colloidal aggregator with a critical aggregation concentration value in the 17 ± 0.44 μM range and naturally possesses green-yellow fluorescence (60). Moreover, the prediction of curcumin as a FLuc inhibitor suggested that curcumin may have additional interference properties that require caution in related bioactivity assays. In their comprehensive review, Kathryn et al. concluded a lack of observed efficacy of oral curcuminoids, leading to failures of clinical trials (41). Although the biophysical foundation for the observed low efficacy remains unknown, the predicted results of curcumin from ChemFH offered a novel perspective, considering curcumin as a frequent hitter in assays. This insight implies a valuable alternative research direction for addressing this issue.

Chaetocin, initially recognized in 2005 as a specific inhibitor of lysine-specific histone methyltransferases, is a fungal metabolite showcasing potential anti-cancer properties (61). However, chaetocin has been subsequently found to be a nonselective and protein-reactive compound (62). As shown in Supplementary Figure S2B, Chaetocin was predicted to be a colloidal aggregator, a blue fluorescence compound, a reactive compound, and associated with other assay interferences. Chaetocin contains a pair of disulfide bonds, a substructure that can confound assays through nonspecific redox behavior. This could explain the positive result for predicting it as a chemical reactive compound. In 2013, researchers revealed that chaetocin engages in the formation of covalent adducts with a diverse array of proteins (62,63). This observation offers a plausible explanation for the predicted aggregation behavior, as colloid aggregators often yield false positive results due to nonspecific binding to target proteins.

It is necessary to emphasize that the predictive results of compounds in ChemFH serve as a reference, alerting researchers to the potential false positive nature of the compound. The definitive determination of the interfering nature of the compound should be conducted through various orthogonal wet-lab approaches, such as adding detergents or decoy proteins (64,65), decreasing the concentration of test compounds (66), etc.

Evaluation on FDA-approved drugs

While researchers must maintain a high level of vigilance regarding experimental or predicted FHs, it is essential to understand that compounds predicted by ChemFH as FHs or assay interferents may not necessarily be inactive compounds sought by researchers, as many FDA-approved drugs have been reported to exhibit frequent hitter or assay interference properties (67). To assess ChemFH’s predictive ability in this context, we initially excluded FDA-approved drugs from the training data, re-trained the model, and then conducted virtual screening on 2575 FDA-approved drugs collected from DrugBank. As shown in Supplementary Figure S3, the percentage of drugs exhibiting assay interference ranges from 3.65% to 6.44%, while promiscuous drugs comprise 15.03% of the total. The overall ratio of FH in FDA-approved drugs was relatively low. As a major contributor of FH (68), 166 (6.44%) colloidal aggregators were predicted to be assay interferents, slightly higher than the reported 3.6%. Evaluation of promiscuous drugs yielded a similar ratio to the 18% observed in a study using in silico docking, implying potential multi-target mechanisms contributing to polypharmacological effects or side effects (69). Of the total, drugs each possessing at least one FH feature comprised 30.87%. This means that 30.87% of current FDA-approved drugs would have been discarded before approval by these filters and hence their discovery missed. The figure decreased to 18.84% when excluding drugs exhibiting only promiscuous features—FHs that may not necessarily be indicative of false positives or assay interference. However, if only the high confidence results from 30.87% of FHs were retained, the overall percentage of FHs will be as low as 6.68%. Furthermore, we assessed ChemFH’s performance on 169 drugs or in-trial compounds known to have FH profiles, comprising 86 aggregators and non-aggregators, 70 FLuc inhibitors, and 13 promiscuous drugs. The results revealed an average prediction accuracy of 0.923. Detailed information of the compounds and the prediction results can be found in Supplementary Data S2 and S3.

Comparison with other web-based tools

We compared FH mechanism coverage, substructure alert, batch evaluation/API, explanation, uncertainty estimation, and processing efficiency among ChemFH, Aggregator Advisor, Luciferase Advisor, FAFDrugs4, Hit Dexter 3.0, ChemAGG and ChemFluc. Details are summarized in Table 2. The results clearly showed that ChemFH outperformed other webservers regarding FH mechanism coverage, utility, and efficiency. Most other webservers focus on a single FH mechanism, with Hit Dexter 3.0 being the only platform incorporating two FH mechanisms prediction: colloidal aggregators and promiscuous compounds. Regarding batch evaluation or API support, most webservers offer either batch evaluation through uploading a molecule list or API support. However, ChemFH stood out among these platforms as the only one providing users with batch evaluation options on the webpage and API support independent of the webpage. As a webserver that provided prediction results with uncertainty scores, ChemFH was also the only platform offering a confidence level for the evaluation results. In runtime analysis, ChemFH processed 1000 molecules for seven FH mechanisms and rule screening in just 21 s, significantly outperforming other webservers. This can be attributed to the multi-task DMPNN architecture utilized in ChemFH, which enables the rapid processing of a large number of molecules.

Table 2.

Comparison of the main features of ChemFH with other web-based platforms

Features ChemFH Aggregator advisor Luciferase advisor Hit Dexter 3.0 ChemAGG ChemFluc
Colloidal aggregators prediction Yes Yes No Yes Yes No
FLuc inhibitors prediction Yes No Yes No No Yes
Fluorescent compounds prediction Yes No No No No No
Chemical reactive compounds prediction Yes No No No No No
Promiscuous compounds prediction Yes No No Yes No No
Substructure alerts/PAINS Yes No No Yes No No
Batch evaluation/API support +++ +++ ++ ++ ++ ++
Explanation +++ + ++ + ++ ++
Uncertainty estimation Yes No No No No No
Availability Free Free Free Free Free Free
Computation time (1000 molecules) 21s 424s 437s >500 min 233 s 202 s

*A higher number of ‘+’ symbols indicates better support in the respective item. Runtime assessment for each platform was conducted ten times, and the average runtime value was demonstrated.

Luciferase Advisor: http://ochem.eu/

Conclusions

The frequent appearance of false-positive results can seriously interfere with hit compound screening, thus leading to a comprehensive waste of time and resources. To bridge the gap between this disadvantage and efficient drug discovery, we have developed ChemFH as an integrated online platform. It serves as a comprehensive tool for evaluating common false positives, facilitating the detection of potential false hits with robust and accurate performance. ChemFH also incorporates 1441 substructures, including representative alert substructures from collected data and those from ten commonly used FH screening rules, as a supplementary tool for FH detection and interpretation. The webserver offers an API for seamless workflow integration, enabling automated high-throughput screening. Additionally, uncertainty estimation methods are provided to enhance result reliability and interpretability with confidence intervals. It is anticipated that through the rational and extensive application of ChemFH, researchers can swiftly and effectively identify potential false positives, thereby enhancing the efficiency and success rate of drug discovery.

Supplementary Material

gkae424_Supplemental_Files

Acknowledgements

We acknowledge Haikun Xu, and the High-Performance Computing Center of Central South University for support. The study was approved by the university's review board.

Contributor Information

Shaohua Shi, Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China; School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China.

Li Fu, Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China.

Jiacai Yi, School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China.

Ziyi Yang, Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China.

Xiaochen Zhang, School of Information Technology, Shangqiu Normal University, Shangqiu, Henan 476000, P.R. China.

Youchao Deng, Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China.

Wenxuan Wang, Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China.

Chengkun Wu, School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China.

Wentao Zhao, School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China.

Tingjun Hou, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P.R. China.

Xiangxiang Zeng, College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, P.R. China.

Aiping Lyu, School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China.

Dongsheng Cao, Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China.

Data availability

The data source used to build FH detection models and derived substructures is available in the Supporting Information. The ChemFH website is freely accessible to all users at https://chemfh.scbdd.com/, and there is no login requirement. Results are promptly displayed on the website and available for download in optional formats. The dataset and the code necessary to build and evaluate the models are at https://github.com/antwiser/ChemFH and https://zenodo.org/doi/10.5281/zenodo.11082970.

Supplementary data

Supplementary Data are available at NAR Online.

Funding

National Key Research and Development Program of China [2021YFF1201400]; National Natural Science Foundation of China [22173118, 22220102001]; Hunan Provincial Science Fund for Distinguished Young Scholars [2021JJ10068]; Science and Technology Innovation Program of Hunan Province [2021RC4011]; Natural Science Foundation of Hunan Province [2022JJ80104]; 2020 Guangdong Provincial Science and Technology Innovation Strategy Special Fund [2020B1212030006, Guangdong-Hong Kong-Macau Joint Lab]. Funding for open access charge: HKBU Strategic Development Fund project [SDF19-0402-P02].

Conflicts of interest statement

None declared.

References

  • 1. Thorne  N., Auld  D.S., Inglese  J.  Apparent activity in high-throughput screening: origins of compound-dependent assay interference. Curr. Opin. Chem. Biol.  2010; 14:315–324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Feng  B.Y., Simeonov  A., Jadhav  A., Babaoglu  K., Inglese  J., Shoichet  B.K., Austin  C.P.  A high-throughput screen for aggregation-based inhibition in a large compound library. J. Med. Chem.  2007; 50:2385–2390. [DOI] [PubMed] [Google Scholar]
  • 3. Yang  Z.Y., He  J.H., Lu  A.P., Hou  T.J., Cao  D.S.  Frequent hitters: nuisance artifacts in high-throughput screening. Drug Discov Today. 2020; 25:657–667. [DOI] [PubMed] [Google Scholar]
  • 4. Babaoglu  K., Simeonov  A., Irwin  J.J., Nelson  M.E., Feng  B., Thomas  C.J., Cancian  L., Costi  M.P., Maltby  D.A., Jadhav  A.  et al.  Comprehensive mechanistic analysis of hits from high-throughput and docking screens against beta-lactamase. J. Med. Chem.  2008; 51:2502–2511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Simeonov  A., Jadhav  A., Thomas  C.J., Wang  Y., Huang  R., Southall  N.T., Shinn  P., Smith  J., Austin  C.P., Auld  D.S.  et al.  Fluorescence spectroscopic profiling of compound libraries. J. Med. Chem.  2008; 51:2363–2371. [DOI] [PubMed] [Google Scholar]
  • 6. Thorne  N., Shen  M., Lea  W.A., Simeonov  A., Lovell  S., Auld  D.S., Inglese  J.  Firefly luciferase in chemical biology: a compendium of inhibitors, mechanistic evaluation of chemotypes, and suggested use as a reporter. Chem. Biol.  2012; 19:1060–1072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Baell  J., Walters  M.A.  Chemistry: chemical con artists foil drug discovery. Nature. 2014; 513:481–483. [DOI] [PubMed] [Google Scholar]
  • 8. Aldrich  C., Bertozzi  C., Georg  G.I., Kiessling  L., Lindsley  C., Liotta  D., Merz  K.M.  Jr, Schepartz  A., Wang  S  The ecstasy and agony of assay interference compounds. ACS Chem. Neurosci.  2017; 8:420–423. [DOI] [PubMed] [Google Scholar]
  • 9. Irwin  J.J., Duan  D., Torosyan  H., Doak  A.K., Ziebart  K.T., Sterling  T., Tumanian  G., Shoichet  B.K.  An aggregation advisor for ligand discovery. J. Med. Chem.  2015; 58:7076–7087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Yang  Z.Y., Dong  J., Yang  Z.J., Lu  A.P., Hou  T.J., Cao  D.S.  Structural analysis and identification of false positive hits in luciferase-based assays. J. Chem. Inf. Model.  2020; 60:2031–2043. [DOI] [PubMed] [Google Scholar]
  • 11. Yang  Z.Y., Yang  Z.J., Dong  J., Wang  L.L., Zhang  L.X., Ding  J.J., Ding  X.Q., Lu  A.P., Hou  T.J., Cao  D.S.  Structural analysis and identification of colloidal aggregators in drug discovery. J. Chem. Inf. Model.  2019; 59:3714–3726. [DOI] [PubMed] [Google Scholar]
  • 12. Bruns  R.F., Watson  I.A.  Rules for identifying potentially reactive or promiscuous compounds. J. Med. Chem.  2012; 55:9763–9772. [DOI] [PubMed] [Google Scholar]
  • 13. Baell  J.B., Holloway  G.A.  New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J. Med. Chem.  2010; 53:2719–2740. [DOI] [PubMed] [Google Scholar]
  • 14. Ghosh  D., Koch  U., Hadian  K., Sattler  M., Tetko  I.V.  Luciferase advisor: high-accuracy model to flag false positive hits in Luciferase HTS assays. J. Chem. Inf. Model.  2018; 58:933–942. [DOI] [PubMed] [Google Scholar]
  • 15. Stork  C., Chen  Y., Sicho  M., Kirchmair  J.  Hit Dexter 2.0: machine-learning models for the prediction of frequent hitters. J. Chem. Inf. Model.  2019; 59:1030–1043. [DOI] [PubMed] [Google Scholar]
  • 16. Lagorce  D., Bouslama  L., Becot  J., Miteva  M.A., Villoutreix  B.O.  FAF-Drugs4: free ADME-tox filtering computations for chemical biology and early stages drug discovery. Bioinformatics. 2017; 33:3658–3660. [DOI] [PubMed] [Google Scholar]
  • 17. Yang  J.J., Ursu  O., Lipinski  C.A., Sklar  L.A., Oprea  T.I., Bologa  C.G.  Badapple: promiscuity patterns from noisy evidence. J Cheminform. 2016; 8:29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Capuzzi  S.J., Muratov  E.N., Tropsha  A.  Phantom PAINS: problems with the utility of alerts for pan-assay INterference CompoundS. J. Chem. Inf. Model.  2017; 57:417–427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Jasial  S., Gilberg  E., Blaschke  T., Bajorath  J.  Machine learning distinguishes with high accuracy between pan-assay interference compounds that are promiscuous or represent dark chemical matter. J. Med. Chem.  2018; 61:10255–10264. [DOI] [PubMed] [Google Scholar]
  • 20. Wassermann  A.M., Lounkine  E., Hoepfner  D., Le Goff  G., King  F.J., Studer  C., Peltier  J.M., Grippo  M.L., Prindle  V., Tao  J.  et al.  Dark chemical matter as a promising starting point for drug lead discovery. Nat. Chem. Biol.  2015; 11:958–966. [DOI] [PubMed] [Google Scholar]
  • 21. Yang  Z.Y., Yang  Z.J., He  J.H., Lu  A.P., Liu  S., Hou  T.J., Cao  D.S.  Benchmarking the mechanisms of frequent hitters: limitation of PAINS alerts. Drug Discov. Today. 2021; 26:1353–1358. [DOI] [PubMed] [Google Scholar]
  • 22. Yang  Z.Y., Dong  J., Yang  Z.J., Yin  M., Jiang  H.L., Lu  A.P., Chen  X., Hou  T.J., Cao  D.S.  ChemFLuo: a web-server for structure analysis and identification of fluorescent compounds. Brief. Bioinform.  2021; 22:bbaa282. [DOI] [PubMed] [Google Scholar]
  • 23. Heid  E., Greenman  K.P., Chung  Y., Li  S.-C., Graff  D.E., Vermeire  F.H., Wu  H., Green  W.H., McGill  C.J.  Chemprop: a machine learning package for chemical property prediction. J. Chem. Inform. Model.  2023; 64:9–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Feng  B.Y., Shelat  A., Doman  T.N., Guy  R.K., Shoichet  B.K.  High-throughput assays for promiscuous inhibitors. Nat. Chem. Biol.  2005; 1:146–148. [DOI] [PubMed] [Google Scholar]
  • 25. Irwin  J.J., Sterling  T., Mysinger  M.M., Bolstad  E.S., Coleman  R.G.  ZINC: a free tool to discover chemistry for biology. J. Chem. Inf. Model.  2012; 52:1757–1768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Wang  Y., Cheng  T., Bryant  S.H.  PubChem BioAssay: a decade's development toward open high-throughput screening data sharing. SLAS Discov. 2017; 22:655–666. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Liu  T., Lin  Y., Wen  X., Jorissen  R.N., Gilson  M.K.  BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res.  2007; 35:D198–D201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Stokes  J.M., Yang  K., Swanson  K., Jin  W., Cubillos-Ruiz  A., Donghia  N.M., MacNair  C.R., French  S., Carfrae  L.A., Bloom-Ackermann  Z.  et al.  A deep learning approach to antibiotic discovery. Cell. 2020; 181:475–483. [DOI] [PubMed] [Google Scholar]
  • 29. Liu  G., Catacutan  D.B., Rathod  K., Swanson  K., Jin  W., Mohammed  J.C., Chiappino-Pepe  A., Syed  S.A., Fragis  M., Rachwalski  K.  et al.  Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii. Nat. Chem. Biol.  2023; 19:1342–1350. [DOI] [PubMed] [Google Scholar]
  • 30. Wong  F., Zheng  E.J., Valeri  J.A., Donghia  N.M., Anahtar  M.N., Omori  S., Li  A., Cubillos-Ruiz  A., Krishnan  A., Jin  W.  et al.  Discovery of a structural class of antibiotics with explainable deep learning. Nature. 2024; 626:177–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Xiong  G., Wu  Z., Yi  J., Fu  L., Yang  Z., Hsieh  C., Yin  M., Zeng  X., Wu  C., Lu  A.  et al.  ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Res.  2021; 49:W5–W14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Wu  Z., Jiang  D., Wang  J., Hsieh  C.Y., Cao  D., Hou  T.  Mining toxicity information from large amounts of toxicity data. J. Med. Chem.  2021; 64:6924–6936. [DOI] [PubMed] [Google Scholar]
  • 33. Cai  H., Zhang  H., Zhao  D., Wu  J., Wang  L.  FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction. Brief. Bioinform.  2022; 23:bbac408. [DOI] [PubMed] [Google Scholar]
  • 34. Chung  Y., Vermeire  F.H., Wu  H., Walker  P.J., Abraham  M.H., Green  W.H.  Group contribution and machine learning approaches to predict Abraham Solute parameters, solvation free energy, and Solvation enthalpy. J. Chem. Inf. Model.  2022; 62:433–446. [DOI] [PubMed] [Google Scholar]
  • 35. Kingma  D.P., Ba  J.  Adam: a method for stochastic optimization. 2014; arXiv doi:22 December 2014, preprint: not peer reviewed 10.48550/arXiv.1412.6980. [DOI]
  • 36. Pearce  B.C., Sofia  M.J., Good  A.C., Drexler  D.M., Stock  D.A.  An empirical process for the design of high-throughput screening deck filters. J. Chem. Inf. Model.  2006; 46:1060–1068. [DOI] [PubMed] [Google Scholar]
  • 37. Yang  Z.Y., Yang  Z.J., Zhao  Y., Yin  M.Z., Lu  A.P., Chen  X., Liu  S., Hou  T.J., Cao  D.S.  PySmash: python package and individual executable program for representative substructure generation and application. Brief Bioinform. 2021; 22:bbab017. [DOI] [PubMed] [Google Scholar]
  • 38. Brenk  R., Schipani  A., James  D., Krasowski  A., Gilbert  I.H., Frearson  J., Wyatt  P.G.  Lessons learnt from assembling screening libraries for drug discovery for neglected diseases. ChemMedChem. 2008; 3:435–444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Brenke  J.K., Salmina  E.S., Ringelstetter  L., Dornauer  S., Kuzikov  M., Rothenaigner  I., Schorpp  K., Giehler  F., Gopalakrishnan  J., Kieser  A.  et al.  Identification of small-molecule frequent hitters of glutathione S-transferase-glutathione interaction. J. Biomol. Screen.  2016; 21:596–607. [DOI] [PubMed] [Google Scholar]
  • 40. Schorpp  K., Rothenaigner  I., Salmina  E., Reinshagen  J., Low  T., Brenke  J.K., Gopalakrishnan  J., Tetko  I.V., Gul  S., Hadian  K.  Identification of small-molecule frequent hitters from AlphaScreen high-throughput screens. J. Biomol. Screen.  2014; 19:715–726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Nelson  K.M., Dahlin  J.L., Bisson  J., Graham  J., Pauli  G.F., Walters  M.A.  The essential medicinal chemistry of curcumin. J. Med. Chem.  2017; 60:1620–1637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Agrawal  A., Johnson  S.L., Jacobsen  J.A., Miller  M.T., Chen  L.H., Pellecchia  M., Cohen  S.M.  Chelator fragment libraries for targeting metalloproteinases. ChemMedChem. 2010; 5:195–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Huth  J.R., Mendoza  R., Olejniczak  E.T., Johnson  R.W., Cothron  D.A., Liu  Y., Lerner  C.G., Chen  J., Hajduk  P.J.  ALARM NMR: a rapid and robust experimental method to detect reactive false positives in biochemical screens. J. Am. Chem. Soc.  2005; 127:217–224. [DOI] [PubMed] [Google Scholar]
  • 44. Sushko  I., Salmina  E., Potemkin  V.A., Poda  G., Tetko  I.V.  ToxAlerts: a web server of structural alerts for toxic chemicals and compounds with potential adverse reactions. J. Chem. Inf. Model.  2012; 52:2310–2316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Yang  Z.Y., Yang  Z.J., Lu  A.P., Hou  T.J., Cao  D.S.  Scopy: an integrated negative design python library for desirable HTS/VS database design. Brief. Bioinform.  2021; 22:bbaa194. [DOI] [PubMed] [Google Scholar]
  • 46. Seoni  S., Jahmunah  V., Salvi  M., Barua  P.D., Molinari  F., Acharya  U.R.  Application of uncertainty quantification to artificial intelligence in healthcare: a review of last decade (2013-2023). Comput. Biol. Med.  2023; 165:107441. [DOI] [PubMed] [Google Scholar]
  • 47. Gal  Y., Ghahramani  Z.  Dropout as a bayesian approximation: representing model uncertainty in Deep Learning. International conference on machine learning. PMLR. 2016; 1050–1059. [Google Scholar]
  • 48. Dolezal  J.M., Srisuwananukorn  A., Karpeyev  D., Ramesh  S., Kochanny  S., Cody  B., Mansfield  A.S., Rakshit  S., Bansal  R., Bois  M.C.  et al.  Uncertainty-informed deep learning models enable high-confidence predictions for digital histopathology. Nat. Commun.  2022; 13:6572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. O’Donnell  H.R., Tummino  T.A., Bardine  C., Craik  C.S., Shoichet  B.K.  Colloidal aggregators in biochemical SARS-CoV-2 repurposing screens. J. Med. Chem.  2021; 64:17530–17539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Proj  M., Knez  D., Sosic  I., Gobec  S.  Redox active or thiol reactive? Optimization of rapid screens to identify less evident nuisance compounds. Drug Discov Today. 2022; 27:1733–1742. [DOI] [PubMed] [Google Scholar]
  • 51. Senger  M.R., Fraga  C.A., Dantas  R.F., Silva  F.P.  Filtering promiscuous compounds in early drug discovery: is it a good idea?. Drug Discov Today. 2016; 21:868–872. [DOI] [PubMed] [Google Scholar]
  • 52. Tian  X., Murfin  L.C., Wu  L., Lewis  S.E., James  T.D.  Fluorescent small organic probes for biosensing. Chem. Sci.  2021; 12:3406–3426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Zhu  M., Yang  C.  Blue fluorescent emitters: design tactics and applications in organic light-emitting diodes. Chem. Soc. Rev.  2013; 42:4963–4976. [DOI] [PubMed] [Google Scholar]
  • 54. McGovern  S.L., Caselli  E., Grigorieff  N., Shoichet  B.K.  A common mechanism underlying promiscuous inhibitors from virtual and high-throughput screening. J. Med. Chem.  2002; 45:1712–1722. [DOI] [PubMed] [Google Scholar]
  • 55. Auld  D.S., Inglese  J., Dahlin  J.L.. Markossian  S., Grossman  A., Arkin  M., Auld  D., Austin  C., Baell  J., Brimacombe  K., Chung  T.D.Y., Coussens  N.P., Dahlin  J.L.  et al.  Assay interference by aggregation. Assay Guidance Manual. 2004; Bethesda (MD). [Google Scholar]
  • 56. Galley  S.S., Pattenaude  S.A., Ray  D., Gaggioli  C.A., Whitefoot  M.A., Qiao  Y., Higgins  R.F., Nelson  W.L., Baumbach  R., Sperling  J.M.  et al.  Using redox-active ligands to generate actinide Ligand radical species. Inorg. Chem.  2021; 60:15242–15252. [DOI] [PubMed] [Google Scholar]
  • 57. Vidler  L.R., Watson  I.A., Margolis  B.J., Cummins  D.J., Brunavs  M.  Investigating the behavior of published PAINS alerts using a pharmaceutical company data set. ACS Med. Chem. Lett.  2018; 9:792–796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Baker  M.  Chemists warn against deceptive molecules. Nature. 2017; 541:144–145. [DOI] [PubMed] [Google Scholar]
  • 59. Duan  D., Doak  A.K., Nedyalkova  L., Shoichet  B.K.  Colloidal aggregation and the in vitro activity of traditional Chinese medicines. ACS Chem. Biol.  2015; 10:978–988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Priyadarsini  K.I.  Photophysics, photochemistry and photobiology of curcumin: studies from organic solutions, bio-mimetics and living cells. J. Photochem. Photobiol. C. 2009; 10:81–95. [Google Scholar]
  • 61. Greiner  D., Bonaldi  T., Eskeland  R., Roemer  E., Imhof  A.  Identification of a specific inhibitor of the histone methyltransferase SU(VAR)3-9. Nat. Chem. Biol.  2005; 1:143–145. [DOI] [PubMed] [Google Scholar]
  • 62. Arrowsmith  C.H., Audia  J.E., Austin  C., Baell  J., Bennett  J., Blagg  J., Bountra  C., Brennan  P.E., Brown  P.J., Bunnage  M.E.  et al.  The promise and peril of chemical probes. Nat. Chem. Biol.  2015; 11:536–541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Cherblanc  F.L., Chapman  K.L., Reid  J., Borg  A.J., Sundriyal  S., Alcazar-Fuoli  L., Bignell  E., Demetriades  M., Schofield  C.J., DiMaggio  P.A.  Jr  et al.  On the histone lysine methyltransferase activity of fungal metabolite chaetocin. J. Med. Chem.  2013; 56:8616–8625. [DOI] [PubMed] [Google Scholar]
  • 64. Ryan  A.J., Gray  N.M., Lowe  P.N., Chung  C.W.  Effect of detergent on “promiscuous” inhibitors. J. Med. Chem.  2003; 46:3448–3451. [DOI] [PubMed] [Google Scholar]
  • 65. Coan  K.E., Shoichet  B.K.  Stability and equilibria of promiscuous aggregates in high protein milieus. Mol. Biosyst.  2007; 3:208–213. [DOI] [PubMed] [Google Scholar]
  • 66. Inglese  J., Auld  D.S., Jadhav  A., Johnson  R.L., Simeonov  A., Yasgar  A., Zheng  W., Austin  C.P.  Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries. Proc. Natl. Acad. Sci. U.S.A.  2006; 103:11473–11478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Seidler  J., McGovern  S.L., Doman  T.N., Shoichet  B.K.  Identification and prediction of promiscuous aggregating inhibitors among known drugs. J. Med. Chem.  2003; 46:4477–4486. [DOI] [PubMed] [Google Scholar]
  • 68. Ferreira  R.S., Simeonov  A., Jadhav  A., Eidam  O., Mott  B.T., Keiser  M.J., McKerrow  J.H., Maloney  D.J., Irwin  J.J., Shoichet  B.K.  Complementarity between a docking and a high-throughput screen in discovering new cruzain inhibitors. J. Med. Chem.  2010; 53:4891–4905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. VanDongen  A.M.  Drug promiscuity: problems and promises. Biology and Medicine. 2024; 16:649. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkae424_Supplemental_Files

Data Availability Statement

The data source used to build FH detection models and derived substructures is available in the Supporting Information. The ChemFH website is freely accessible to all users at https://chemfh.scbdd.com/, and there is no login requirement. Results are promptly displayed on the website and available for download in optional formats. The dataset and the code necessary to build and evaluate the models are at https://github.com/antwiser/ChemFH and https://zenodo.org/doi/10.5281/zenodo.11082970.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES