Abstract
Identification of anti-SARS-CoV-2 compounds through traditional high-throughput screening (HTS) assays is limited by high costs and low hit rates. To address these challenges, we developed machine learning models to identify compounds acting via inhibition of the entry of SARS-CoV-2 into human host cells or the SARS-CoV-2 3-chymotrypsin-like (3CL) protease. The optimal classification models achieved good performance with the area under the receiver operating characteristic curve (AUC-ROC) values > 0.78. Experimental validation showed that the best performing models increased the assay hit rate by 2.1-fold for viral entry inhibitors and 10.4-fold for 3CL protease inhibitors compared to the original drug repurposing screens. Twenty-two compounds showed potent (< 5 μM) antiviral activities in a SARS-CoV-2 live virus assay. In conclusion, machine learning models can be developed and used as a complementary approach to HTS to expand compound screening capacities and improve the speed and efficiency of anti-SARS-CoV-2 drug discovery.
Graphical Abstract

INTRODUCTION
The current global pandemic of coronavirus disease 19 (COVID-19) is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a highly infectious enveloped RNA virus 1, 2. The transmission of SARS-CoV-2 occurs primarily through inhalation of respiratory droplets of infected individuals and/or contact with virally contaminated objects 3. The initial phase of SARS-CoV-2 infection is the virus entry into the host cells through the interaction of the receptor-binding domain (RBD) of the viral Spike glycoprotein with the angiotensin converting enzyme 2 (ACE2) on the cell surface 4, 5. Subsequently, SARS-CoV-2 initiates RNA replication and eventually assembles new virions that are released to infect other cells in the host 6. Each stage of the SARS-CoV-2 life cycle (e.g., viral entry, and viral replication) could be targeted for the development of specific antiviral drug candidates for COVID-19 treatment 7, 8.
Drug repurposing has been widely used to identify new clinical indications from existing drugs for the treatment of many diseases, including viral infections. During the COVID-19 outbreak, several high-throughput drug repurposing assays were developed and applied to identify potential inhibitors of SARS-CoV-2 entry and replication at the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH) 9–11. For example, the pseudotyped particle (PP) entry assay is a cell-based assay with a luminescence readout, which can facilitate the identification of viral cell entry inhibitors using pseudotyped viral particles containing coronavirus Spike glycoprotein without the viral genome 9. Another potential target for the development of antiviral therapies is the SARS-CoV-2 3-chymotrypsin-like (3CL) protease, which plays a vital role in viral replication by cleaving the viral polyprotein to form the RNA replicase-transcriptase complex 8. The 3CL protease assay is a fluorescence-based biochemical assay that measures the inhibitory effect of compounds on the activity of SARS-CoV-2 3CL protease 11. However, due to limitations in resources, it is impractical to efficiently screen large chemical libraries using these assays 12, 13. In addition, there are millions of commercial compounds that are not practical to be included for high-throughput screening (HTS) as they are not part of our internal compound collections.
As an alternative to physical HTS, computational models such as machine learning classifiers can make predictions on new unseen data based on previous experiences and known data properties, such that they have been widely used to virtually screen millions of compounds for potential biological activities 14–16. Among the machine learning models, quantitative structure-activity relationship (QSAR) approaches enable the prediction of biological activities for compounds of interest as a function of similarity in chemical structure (i.e., molecular descriptors) 17, 18. Unlike QSAR, biological activity-based modeling (BABM) approaches build on the hypothesis that compounds showing similar biological activity patterns tend to share similar biological targets or mechanisms of action 16, 19. The combined use of these two methods exhibit complementary advantages, such that their application domains are not limited to small molecules with well-defined structures (e.g., QSAR) or substances with available biological profiling (e.g., BABM). The PP entry and 3CL protease assays have been applied to screen several thousands of known bioactive compounds including the NCATS Pharmaceutical Collection (NPC) of approved and investigational drugs 9–11, 20. The data generated from these assays can be used to build computational models to predict new PP entry or 3CL protease inhibitors.
Given the fast-growing number of COVID-19 patients and the current lack of effective drug treatments, there is an urgent need to accelerate efforts in exploring new potential drugs to treat COVID-19. In this study we applied machine learning methods including both structure (QSAR) and activity (BABM) based approaches to build models for prediction of potential inhibitors of SARS-CoV-2 entry and 3CL protease. The selected hits predicted by these models were first tested in the SARS-CoV-2 entry assay or 3CLpro assay. The antiviral activities of compounds were then confirmed in a cell-based live SARS-CoV-2 cytopathic effect reduction (CPE) assay 21, 22.
RESULTS
Construction and evaluation of the prediction models
A total of 8,149 unique compounds were screened against either the PP entry assay (2,725 compounds) or the 3CL protease assay (8,044 compounds) (Figure 1A and Supplementary Table S1). These compounds fell into different activity categories (Figure 1A). For example, 570 compounds were active only in the PP entry assay, 142 compounds were only active in the 3CL protease assay, and 15 compounds were active against both PP entry and 3CL protease. Based on these data, we built two types of models, QSAR and BABM, for the prediction of PP entry or 3CL protease inhibitors (Figure 1B–1F). We tested various parameter combinations on the training datasets to find the optimal model for each assay target (Figure 1B–1D). For example, the combination of RF (machine learning algorithm), Original (rebalancing strategy), and 157 ECFP4 features (Fisher’s exact test with P value = 0.02) produced the best classification performance (AUC-ROC = 0.77±0.02) for predicting PP entry inhibitors, and the corresponding parameters for the best performing 3CL protease model (AUC-ROC = 0.90±0.03) were SVM, ROSE and 80 ECFP4 features (Fisher’s exact test with P value = 0.05). The optimal QSAR models also showed good prediction performance on the external validation dataset for PP entry (AUC-ROC = 0.78) and 3CL protease (AUC-ROC = 0.88) (Figure 1E). The feature sets that produced the optimal QSAR models for the prediction of PP entry or 3CL protease inhibitors are listed in Supplementary Table S2, and the AUC-ROC values obtained from all QSAR models are listed in Supplementary Table S3. We observed large differences among the models built with different methods and feature sets. The AUC-ROC values for predicting PP entry inhibitors ranged from 0.64 to 0.78 with an average of 0.73, while the AUC-ROC values for predicting 3CL protease inhibitors ranged from 0.64 to 0.90 with an average of 0.81. For the BABM models, the AUC-ROC values on the test sets ranged from 0.84 to 0.88 for the PP entry inhibitor models and 0.85 to 0.89 for the 3CL protease inhibitor models, with the combined model (CM-M, structure (ToxPrint)-based model + BABM-M) yielding the best performance (PP entry: AUC-ROC = 0.88±0.01, 3CL protease: AUC-ROC = 0.89±0.03) (Figure 1F).
Figure 1. Construction and evaluation of the optimal machine learning classification models.

(A) Activity distribution of compounds in the original drug repurposing screens, including the PP entry assay and the 3CL protease assay. (B-D) The optimal QSAR models built on the training dataset, including model performance (B), number of selected features (C), and data rebalancing strategies and machine learning algorithms (D). (E) Example ROC curves of the optimal QSAR models on the external validation dataset. (F) BABM model performance measured by AUC-ROC. Abbreviations: ROSE, Random Over Sampling Examples; SVM, support vector machine; RF, Random Forest; QSAR, quantitative structure (ECFP4)-activity relationships; SBM, structure (ToxPrint)-based model; BABM-M, activity-based model (MLS); BABM-S, activity-based model (Sytravon); CM-M, combined model (SBM + BABM-M); CM-S, combined model (SBM + BABM-S).
Virtual screening and experimental validation of model predicted compounds
In an attempt to identify novel inhibitors of SARS-CoV-2 entry and 3CL protease, the optimal models were applied to virtually screen a large compound collection of ~360K compounds. For the QSAR models, a total of 4,868 compounds with the highest predicted probabilities (> 0.5) were collected, including 2,743 predicted PP entry inhibitors and 2,125 predicted 3CL protease inhibitors. Their density distributions of the probabilities for both sets of predictions exhibited a single peak at relatively small probability values (peak value < 0.56) for most of compounds (Supplementary Table S4, S5, and Figure 2A). For BABM models, a total of 847 compounds were collected, including 485 predicted PP entry inhibitors and 364 predicted 3CL protease inhibitors (Supplementary Table S6, and S7). After combining these compounds and filtering by structure (see Methods for details), a total of 1,972 predicted PP entry inhibitors and 1,493 predicted 3CL protease inhibitors were selected for experimental validation (Supplementary Table S4–S7). For the QSAR models, when the default probability cut-off of 0.5 was used, the model increased the assay hit rate by 1.6-fold (from 21.5% to 34.9%) for the PP entry inhibitors, and 8.4-fold (from 1.76% to 14.8%) for the 3CL protease inhibitors, respectively (Figure 2B). As the probability cut-off increased, the hit rates of the QSAR model predictions gradually increased as well (Figure 2C). When the probability cut-off was set to 0.9, the hit rates of the QSAR model predictions reached 66% for the PP entry assay, and 22% for the 3CL protease assay, respectively (Figure 2C). Moreover, the BABM models increased the assay hit rate by 2.1-fold for the PP entry (from 21.5% to 45.6%), and 10.4-fold (from 1.76% to 18.4%) for the 3CL protease, respectively (Figure 2B).
Figure 2. Virtual screening of a large compound library based on the optimal models.

(A) Distributions of the predicted probabilities of QSAR model identified compounds. (B) Comparison of hit rates between the original drug repurposing screens and model predictions. (C) Hit rates of the QSAR model predictions based on different prediction probability cutoffs. (D) Hit rates of the potential PP entry inhibitors and 3CL protease inhibitors in the CPE assay.
Secondary experimental confirmation of model predicted PP entry and 3CL protease inhibitors
To further confirm the experimentally validated predictions, a total of 672 compounds, including 446 PP entry inhibitors and 226 3CL protease inhibitors, were re-tested at 11 concentrations (Supplementary Table S8, and S9). For the PP entry assay, 328 of the 446 compounds remained active, yielding a confirmation rate of 74%. Of the confirmed PP entry inhibitors, 149 were known drugs or bioactive compounds, while the other 179 inhibitors were diverse compounds without any well-annotated biological activity (Supplementary Table S8). For the 3CL protease assay, 148 of the 226 compounds remained active, yielding an assay confirmation rate of 65%. Of the confirmed 3CL protease inhibitors, 62 were known drugs or bioactive compounds, while the other 86 inhibitors were diverse compounds without any well-annotated biological activity (Supplementary Table S9). The most potent PP entry inhibitor was NCGC00390584 (Exatecan, IC50 = 3.1 nM), and the most potent 3CL protease inhibitor was NCGC00390337 (Z-DQMD-FMK, IC50 = 0.92 μM) (Supplementary Table S8, and S9). Several representative compounds with potent inhibitory effect against SARS-CoV-2 PP entry or 3CL protease were shown in Figure 3 and Figure 4, such as NCGC00599688 (AQ-13, IC50 = 14.23±11.66 μM) for the PP entry inhibitor, and NCGC00371011 (Fluorobexarotene, IC50 = 28.95±2.35 μM) for the 3CL protease inhibitor.
Figure 3. Concentration -response curves of representative PP entry inhibitors in the anti-SARS-CoV-2 PP entry assay and the CPE assay.

PP, PP entry assay; TOX, cytotoxicity assay; CPE, CPE assay; IC50, half maximal inhibitory concentration; AC50, half-maximal activity concentration. Results are presented as mean ± SD, and the error bars represent the SD of two independent experiments.
Figure 4. Concentration-response curves of representative 3CL protease inhibitors in the anti-SARS-CoV-2 3CL protease assay and the CPE assay.

3CL, 3CL protease assay; RFU, relative fluorescence unit; TOX, cytotoxicity assay; CPE, CPE assay; IC50, half maximal inhibitory concentration; AC50, half-maximal activity concentration. Results are presented as mean ± SD, and the error bars represent the SD of two independent experiments.
Assessment of compound antiviral activity in the SARS-CoV-2 CPE assay
A total of 578 PP entry inhibitors and 150 3CL protease inhibitors were further tested in a SARS-CoV-2 live virus assay, the CPE assay. The results showed that 28.2% of the PP entry inhibitors and 15.3% of the 3CL protease inhibitors were active in the CPE assay (Supplementary Table S4–S7, S10, and Figure 2D). For secondary confirmation, the 127 compounds that showed activity in the CPE assay were re-tested at eight or more concentrations (instead of four or five concentrations in the first screen) to further confirm their activity and get more accurate potency measures. Of the 127 compounds, 122 remained active, yielding a confirmation rate of 96% for the SARS-CoV-2 CPE assay (Supplementary Table S11, and Figure 5). Of the 122 confirmed CPE actives, the potencies ranged from 0.20 μM (NCGC00345807, CAA-0225) to 22 μM (NCGC00417833) with an average potency of 10.4 μM (Supplementary Table S11). Moreover, 22 compounds showed potencies < 5 μM, accounting for 18% of the total CPE actives (Supplementary Table S11). In addition, 55 CPE actives were known drugs or bioactive compounds, while the other 67 compounds were compounds without any well-annotated biological activity (Supplementary Table S11). Six representative compounds with potent anti-SARS-CoV-2 activity (< 5 μM) in the CPE assay were shown in Figure 5, including NCGC00345807 (CAA-0225, AC50 = 0.20±0.07 μM μM), NCGC00161621 (Cepharanthine, AC50 = 1.41±0.39 μM), MLS000703078 (AC50 = 1.78±0.00 μM), NCGC00390625 (Maropitant, AC50 = 3.55±1.12 μM), NCGC00599688 (AQ-13, AC50 = 3.98±0.00 μM), and NCGC00017063 (Amodiaquine, AC50 = 4.14±1.41 μM).
Figure 5. Secondary experimental confirmation of the potential CPE actives and concentration-response curves of representative compounds.

The heatmap shows the overall potencies of compounds confirmed in the CPE assay (inner ring) and the cytotoxicity counter screen (outer ring). The heat map is colored by compound potency, such that darker shades of red indicate more potent compounds and lighter shades of blue indicate less potent compounds. Gray indicates missing values. The outer yellow line represents known bioactive compounds, and the outer green line represents compounds with no previously reported bioactivity. Concentration-response curves are shown for representative compounds with potent anti-SARS-CoV-2 activity (< 5 μM). CPE, CPE assay; TOX, cytotoxicity assay; AC50, half-maximal activity concentration. Results are presented as mean ± SD, and the error bars represent the SD of two independent experiments.
DISCUSSION AND CONCLUSIONS
Identification of inhibitors of SARS-CoV-2 entry into the host cells and 3CL protease from compound screening can result in lead compounds that may potentially be developed into anti-COVID-19 therapeutics. In this study, we built machine learning classification models based on the data generated from the high-throughput anti-SARS-CoV-2 drug repurposing assays, and applied the optimal models to virtually screen a large collection of diverse compounds. The 122 compounds were validated experimentally for their anti-SARS-CoV-2 activities in the live SARS-CoV-2 CPE assay.
The circular topological descriptor ECFP4, which has been widely used in drug discovery 23, was used to build the QSAR models in this study. Since the ECFP4 fingerprint has 1,024 bits, feature selection was performed to avoid overfitting and possibly improve the prediction performance (Figure 1C). Because of the imbalanced nature of the original assay outcomes, i.e., the large prevalence of inactive compounds compared to active compounds in a dataset, five rebalancing strategies were applied prior to modeling. The best rebalancing strategy found for the 3CL protease dataset is consistent with previous research findings that ROSE worked well on improving the predictive power of the models 24. However, the original data without applying any rebalancing achieved the best performance for the PP entry dataset (Figure 1D), indicating that a balanced dataset does not necessarily result in better performance. Consistent with previous reports on constructing and optimizing classification models 25–27, optimal model performance is often the result of a combination of the best feature set, rebalancing strategy, and machine learning algorithm (Figure 1B–1D). The optimal QSAR models achieved good performance on the external validation dataset with AUC-ROC values of 0.78 for predicting PP entry inhibitors, and 0.88 for 3CL protease inhibitors (Figure 1E). Our QSAR model for predicting 3CL protease inhibitors outperformed the previously reported models (AUC-ROC = 0.71), which were constructed on the same dataset as the one used in this study, but without applying any rebalancing strategy 28. In addition, the BABM models combing activity and structure data showed better performances (AUC-ROC > 0.84) than the QSAR models (Figure 1), further confirming the value of biological activity profiles in improving prediction performance 16. These findings are consistent with other studies demonstrating that combining in vitro bioactivity with chemical structure descriptors could improve the predictive performance of machine learning models 27, 29. To improve our model, 99 3CL protease inhibitors with IC50 <50 μM were retrieved from the literature (Supplemental Table S12) 30–37. For each compound in a compound set, we identified its closest structural neighbor by calculating the Tanimoto coefficient (TC). The TC between a compound and its closest structural neighbor is defined as maxTC. The average maxTC value (AmaxTC) of the 99 3CL protease inhibitors was 0.74, which was much larger than that of the 3CL protease inhibitors in the original model training set (AmaxTC = 0.32) and the AmaxTC between the literature 3CL inhibitor set and the original model training set (AmaxTC = 0.28) (Supplemental Figure S1). These findings indicate that the newly added compounds are structurally similar to each other but distinct from the original model training set, thus adding the literature compounds increased the structural diversity of the training set. After adding the 99 3CL protease inhibitors to the model training set, we found that the RF model produced better performance (AUC-ROC value = 0.86±0.04) based on the three-fold cross-validation results. Using the best threshold (0.07) defined by the Youden index, the RF model was able to correctly identify 94 (95%) of the 99 literature 3CL protease inhibitors (Supplemental Table S12). The addition of the 3CL protease inhibitors from the literature helped to expand the applicability domain of our model, and further demonstrated the reliability and robustness of our model.
The optimal models (including QSAR and BABM) can be used to rapidly screen large compound libraries to identify potential anti-SARS-CoV-2 compounds and prioritize them for experimental validation (Figure 2). Compared with the original HTS assays, the optimal machine learning models increased the assay hit rate by 2.1-fold for viral entry inhibitors, and 10.4-fold for 3CL protease inhibitors, resulting in a hit rate of 45.6% for the PP entry assay and 18.4% for the 3CL protease assay, respectively, from a large diverse compound library (Figure 2B). The model hit rates were calculated based on the experimental validation results. For experimental validation, the model predicted active compounds were tested in the same assay as the one used for the original drug repurposing screen 9–11. The hit rate of the experimental validation screen is referred to as the model hit rate, which is equivalent to the positive predictive value [PPV = TP/(TP + FP)] (Figure 2B–2C). As the probability cut-off increased, the model hit rates (i.e., PPVs) and specificities (TN/(TN + FP)) of both the PP entry and 3CL protease models increased, while their sensitivities (TP/(TP + FN)) would decreased. These hit rates are significantly higher than that of a typical HTS of a diverse compound library, which is < 1.5%, demonstrating that our models could significantly improve the efficiency of anti-COVID-19 drug lead identification. Multi-target drugs, i.e., compounds that interact with multiple targets in the biological network simultaneously, have been widely used in the field of drug discovery, especially for the treatment of complex diseases 38, 39. COVID-19 has emerged as a complex disease that presents variable clinical symptoms and disease progression (e.g., asymptomatic, acute respiratory distress syndrome, and multi-organ failure) 40. In this study, the PP entry assay is a phenotypic assay that contains multi-targets in the viral entry process, while the 3CL protease assay is a single target assay. When evaluating the antiviral activities of the hits identified from these two assays, we found that the PP entry inhibitors had a higher hit rate (28.2%) than the 3CL protease inhibitors (15.3%) in the CPE assay (Figure 2D), further confirming that multi-target compounds could be more effective antivirals than single target compounds 41. In addition, these results also suggest that multi-target assays may be an important direction in new assay development for more efficient anti-SARS-CoV-2 drug discovery, whereas the single target assays are more suitable for investigating compound mechanisms of action 42. Although the addition of DTT in the 3CL protease assay could reverse the inhibitory effect of the compounds with electrophilic warheads, this situation may not have a dramatic impact on our results. For example, the IC50 of NCGC00390337 (Z-DQMD-FMK, the most potent 3CL protease inhibitor in our results) was 0.92 μM, which was not significantly different from the finding of Sun et al. (IC50 = 0.94 μM) 43. In this study, our models identified 122 compounds that were active in the SARS-CoV-2 CPE assay (Supplementary Table S11), and these compounds could be further developed as potential anti-COVID-19 treatments. For example, among the 22 potent compounds with AC50 < 5 μM, ten were known drugs or bioactive compounds, while the others were diverse compounds without any well-annotated biological activities. For example, four compounds (i.e., NCGC00345807, NCGC00161621, NCGC00386665, and NCGC00017063) have been reported to have anti-SARS-CoV-2 activities. The most potent compound is NCGC00345807 (CAA-0225, a cathepsin L-specific inhibitor, AC50 = 0.20 μM) (Figure 5, and Supplementary Table S11), which was also reported as a lead anti-SARS-CoV-2 compound in a previous study 22. Based on our results, the anti-SARS-CoV-2 mechanism of CAA-0225 may be attributed partly to the inhibition of SARS-CoV-2 entry into the host cells (IC50 = 0.60 μM, efficacy = 89%) (Supplementary Table S8). NCGC00161621 (Cepharanthine), an approved drug with anti-inflammatory activities, has been reported to rescue the CPE of SARS-CoV-2 to full efficacy probably due to the inhibition of spike-mediated cell entry or SARS-CoV-2 replication 9, 44, 45. Consistent with these previous studies, we also found that Cepharanthine showed potency against SARS-CoV-2 CPE effect with an AC50 of 1.41 μM, and its potential mechanism of action is to inhibit virus entry into host cells (IC50 = 6.88 μM, efficacy = 102%) (Figure 5, Supplementary Table S11, and S8). NCGC00386665 (Bemcentinib), a selective small-molecule inhibitor of AXL kinase, has been reported as a dual inhibitor of SARS-CoV-2 papain-like protease and 3CL protease based on molecular docking. This compound was also reported to potentially reduce viral infection and block the SARS-CoV-2 Spike protein according to a multicenter, seamless, phase 2 adaptive randomization platform study 46. Consistent with these findings, in our study Bemcentinib showed potent activity against the SARS-CoV-2 CPE with an AC50 of 3.36 μM, potentially via inhibition of virus entry into host cells (IC50 = 19 μM, efficacy = 112%) (Supplementary Table S11, and S8). NCGC00017063 (Amodiaquine), an antimalarial and anti-inflammatory agent, was reported to show activity in the anti-SARS-CoV-2 CPE assay 45, 47, 48. Consistent with the previous report, Amodiaquine showed potent activity against the SARS-CoV-2 CPE in our study with an AC50 of 3.98 μM, potentially by inhibiting virus entry into host cells (IC50 = 19 μM, efficacy = 103%) (Figure 5, Supplementary Table S11, and S8). In addition, the other eighteen potent compounds have no previous reports on their antiviral activity, therefore, in-depth investigations are needed to explore these compounds as potential drug candidates for the treatment of COVID-19. In addition, six of the 22 potent compounds (i.e., NCGC00599688, NCGC00590975, NCGC00390625, NCGC00263128, NCGC00482724, NCGC00484976) have been reported to have biological activities, but not antiviral activities. For example, NCGC00599688 (AQ-13), an analogue of Chloroquine (CQ), is an investigational antimalarial drug 49. NCGC00590975 (JQEZ5) is a potent pharmacologic Enhancer of Zeste Homolog 2 (EZH2) inhibitor that exhibits significant in vivo anti-tumor activity in EZH2 mutant cancer models 50. NCGC00390625 (Maropitant) is a selective neurokinin 1 receptor antagonist that is clinically used as a new anti-emetic drug for dogs 51. NCGC00263128 (PD 0220245) shows inhibitory effects on both IL-8 receptor binding and IL-8-mediated neutrophil chemotaxis 52. NCGC00482724 (Vacquinol-1), a quinolone derivative, has been reported to display potent anti-tumor effects by inducing rapid cell death in in glioblastomas 53. NCGC00484976 (Z36), a novel B-cell lymphoma-extra large (Bcl-xL) protein inhibitor, could efficiently induce autophagic cell death by blocking the interaction between Bcl-xL/Bcl-2 and Beclin-1 54. The remaining twelve compounds have no reports on their biological activities. Since the development process of a new drug takes a long time, an in-depth investigation is needed to explore these twenty-two compounds as potential drug candidates for the treatment of COVID-19.
In summary, we applied machine learning classification models, including QSAR and BABM models, to identify inhibitors of SARS-CoV-2 entry into the host cells and the SARS-CoV-2 3CL protease from a large diverse compound library. The optimal models significantly increased the hit rates of the anti-SARS-CoV-2 HTS assays by several folds. Twenty-two compounds showed potent (< 5 μM) anti-SARS-CoV-2 activities in a live virus assay, for 18 of which the acti-SARS-CoV-2 activities have not been reported previously. These compounds have the potential to be developed as novel anti-COVID-19 drug leads. Therefore, machine learning classification models can be used as a complementary approach to HTS to improve the speed and efficiency of anti-SARS-CoV-2 drug discovery.
Experimental Section
HTS assays
All assays were performed according to protocols described previously 16. Briefly, the SARS-CoV-2 PP entry assay was performed in a human ACE2 (HEK293-ACE2) cell line under biosafety level 2 (BSL-2) containment. After 48 hours of incubation for PP entry into cells and luciferase reporter expression, luciferase activity was measured using Bright-Glo Luciferase Assay reagents (Promega). The 3CL protease assay was performed in black, medium-binding microplates (Greiner Bio-One) with enzyme (50 nM) in reaction buffer and substrate. After one hour incubation for the enzyme reaction, fluorescence intensity was measured at excitation and emission wavelengths of 340 nm and 460 nm, respectively. Compounds were tested as five-point 1:5 titrations (experimental validation) or 11-point 1:3 titrations (experimental confirmation), starting from 57.5 μM, for both the PP entry assay and 3CL enzyme assay. The live SARS-CoV-2 CPE assay was performed at a contractor BSL-3 facility at the Southern Research Institute (Birmingham, AL). In addition, the cytotoxicity of these compounds was evaluated in a cell viability assay that measures intracellular ATP content using a PHERAstar FSX plate reader (BMG LABTECH). All compound libraries used in this study were assembled within NCATS. The purity of lead compounds was determined to be greater than 95% by HPLC and copies of the HPLC traces were provided in Supplementary Figure S1.
Data collection for modeling
NCATS in-house collections of bioactive compounds including approved and investigational drugs were screened against the PP entry assay and the 3CL protease assay in quantitative HTS (qHTS) format 9, 11. Detailed descriptions of these high-throughput drug repurposing assays and all the screening data are publicly available through the NCATS/NIH open science data portal (OpenData, https://opendata.ncats.nih.gov/covid19/). The qHTS data were analyzed using custom software developed in house at NCATS. Briefly, concentration response curves were fit to a four-parameter Hill equation yielding concentrations of half-maximal activity (AC50) and maximal response (efficacy) values 55, 56. Compounds were further designated as class 1–4 based on the shape of the concentration-response curve 57. Compounds that exhibited activation effects were assigned class 1.1, 1.2, 1.3, 1.4, 2.1, 2.2, 2.3, 2.4, and 3 curves. Compounds that exhibited inhibitory effects were assigned class −1.1, −1.2, −1.3, −1.4, −2.1, −2.2, −2.3, −2.4, and −3 curves. Compounds that showed no significant concentration response were considered inactive and assigned class 4. For modeling purposes, we assigned each compound one of three outcomes: active (1), inactive (0) or inconclusive (exclude). For the PP entry assay, compounds that showed inhibition with >50% efficacy and were inactive or at least 6-fold less potent in the cell viability counter assay were considered active (1), compounds with a positive curve class were considered inactive (0), and all other compounds were considered inclusive and excluded from modeling. For the 3CL protease assay, compounds that showed inhibition with >50% efficacy were considered active (1), compounds with a positive curve class were considered inactive (0), and all other compounds were considered inclusive and excluded from modeling. The activity assignments for all the compounds included in constructing the PP entry assay and 3CL protease assay models are provided in Supplemental Table S1, along with their corresponding structures.
QSAR modeling
The two-dimensional structures of all compounds encoded in SMILES strings were converted to the Extended Connectivity Fingerprints radius 4 (ECFP4) using the Chemistry Development Kit (CDK) package 61 in the Konstanz Information Miner open-source software (KNIME, version 4.0.2, https://www.knime.org/) 62. ECFP4 encodes circular topological fragments into a fixed length binary fingerprint (n = 1,024) where the presence or absence of the feature was recorded in a binary system as 1 or 0, respectively 62.
Five classification machine learning models were built and tested using R version 3.4.2, including “e1071” package for Naïve Bayes (NB) and support vector machine (SVM) classifiers, “Random Forest” package for random forest (RF) classifier, “nnet” package for neural networks (NNET) classifier, and “xgboost” package for eXtreme gradient boosting (XGboost) classifier 58. The NB classifier is implemented with a Laplace smoothing setup, while the SVM classifier uses a Gaussian radial basis function kernel. In addition, the other parameters of the SVM, RF and NNET classifiers were set to default values. The parameters of the XGboost classifier were set as follows: maximum depth of a tree (3), control the learning rate (0.01), and subsample ratio of columns when constructing each tree (0.5).
In order to optimize model performance, feature selection prior to machine learning modeling was performed using four different methods: Fisher’s exact test with P value, area under the receiver operating characteristic curve (AUC-ROC) value, Gini score from the RF algorithm, and Gain score from the XGboost algorithm. For the Fisher’s exact test method, features were selected at five different P value cutoffs, which were in the range from 0.01 to 0.05 with an interval of 0.01. For the AUC-ROC method, features were selected at four different cutoffs, which were in the range from 0.52 to 0.58 with an interval of 0.02 using “pROC” packages 59. For the RF and XGboost methods, features were selected using the “Random Forest” and “xgboost” 58 packages, respectively. Features ranked with Gini or Gain scores were picked at 10 intervals from the top 10 to top 50. Different feature sets generated in the feature selection process were used to build machine learning models and their performances were evaluated.
To evaluate model performance, the dataset was randomly divided into two parts: 70% for training and cross-validation, and 30% for external validation. The training dataset was used to tune the model parameters to yield the maximum model performance, and the external validation dataset was used to evaluate the model’s extrapolation capacity to new data. Each model was evaluated by an internal 3-fold cross validation on the training dataset. To ensure the robustness of our results, the cross-validation process was repeated 20 times with different random data partitions. Because the class distributions of the assay outcomes were imbalanced, the training dataset was rebalanced using four different subsampling methods, including Up sampling, Down sampling, Random Over Sampling Examples (ROSE), and Synthetic Minority Over-sampling Technique (SMOTE) via the “ROSE” and “DMwR” packages in R 60, 61. Model performance was evaluated by the AUC-ROC value, which ranged from 0.5 (a random classifier) to 1 (a perfect classifier). The combinations of feature sets, rebalancing strategies, and machine learning algorithms yielded models with different performances, and the model with the optimal performance (i.e., maximum AUC-ROC value) was used for further virtual screening.
BABM modeling
Two types of bioactivity-based models (BABM-M and BABM-S) and two types of structure-activity combined modes (CM-M and CM-S) were built using compound activity profiles from two sets of qHTS assays (MLS, 225 readouts; Sytravon, 130 readouts) 16. The activity-based models were trained and tested using the activity profiles from the NCATS Pharmaceutical Collection (NPC)20 and the Library of Pharmacologically Active Compounds (LOPAC). The ChemoTyper was used to generate structure fingerprints for all compounds for the structure-activity combined models. In the combined models, the structure fingerprint and the activity profile were concatenated to form a new fingerprint for each compound. For modeling purposes, each compound was represented as a bit vector of 1s and 0s. In a structure fingerprint, the bit value was set to 1 if the compound contains a particular structural feature and 0 if the compound does not have that feature. For activity profile data, each assay readout was treated as a feature and the feature value was set to 1 for “active” compounds and 0 for inactive compounds. The detailed modeling process was described previously 16. Briefly, the Weighted Feature Significance (WFS) method previously developed at NCATS was used to build the models. For each model, compounds were randomly divided into two groups of approximately equal sizes (i.e., one used for training and the other for testing). To evaluate the robustness of the models, the randomization was performed 10 times, generating 10 different training and test sets. The performance of the BABM models was evaluated by calculating the AUC-ROC value. The random data split and model training and testing were repeated ten times, and the average AUC-ROC values were calculated for each model.
Virtual screening of large compound libraries
The optimal models were applied to predict the activity of the NCATS in-house collection of ~360K compounds against viral entry, as well as 3CL protease. For the QSAR models, each compound was assigned a predicted probability to a specific outcome based on the optimal model, and compounds with the highest probabilities (> 0.5) were collected. To identify novel, structurally diverse compounds, the collected compounds were subjected to a k-means clustering analysis using the Hartigan-Wong algorithm implemented in R, resulting in 100 clusters based on ECFP4 fingerprints. The compounds with top probabilities in each cluster were selected for experimental validations. For BABM models, WFS score cutoff values for model-predicted actives were determined using the ROC curves where both sensitivity and specificity were optimized. Only compounds that scored higher than the cutoff values were considered candidates for follow up selection. For experimental validation, the selection was driven mostly by availability of assay resources and physical samples. The top ranked compounds that met the WFS score cutoff from a model with physical samples available for cherry picking were selected to fit into one or two 1,536-well plates for testing.
Statistical Analysis
Example curve plots were fitted using four-parameter logistic regression with the % assay activity as the response and log10 compound concentration as the independent variable using the “drc” statistical package in R. Plots were generated using the “ggplot2” package in R. Representative chemical structures were drawn using the ChemDraw Professional software (version 17.1).
Supplementary Material
ACKNOWLEDGMENTS
This work was supported by the Intramural Research Programs of the National Center for Advancing Translational Sciences, National Institutes of Health. The authors would like to thank the NCATS OpenData Portal team for making the screening data publicly available, and Katlin Recabo, Danielle Bougie and Paul Shinn for assistance with compound management and quality control (QC).
ABBREVIATION USED
- 3CL
3-chymotrypsin-like
- AC50
half-maximal activity concentration
- ACE2
angiotensin converting enzyme 2
- AUC-ROC
area under the receiver operating characteristic curve
- BABM
biological activity-based modeling
- BABM-M
activity-based model (MLS)
- BABM-S
activity-based model (Sytravon)
- CM-M
combined model (SBM + BABM-M)
- CM-S
combined model (SBM + BABM-S)
- COVID-19
coronavirus disease 19
- FN
false negative
- FP
false positive
- HTS
high-throughput screening
- NB
naïve bayes
- NNET
neural networks
- NPC
NCATS Pharmaceutical Collection
- PP
pseudotyped particle
- PPV
positive predictive value
- QSAR
quantitative structure-activity relationships
- RBD
receptor-binding domain
- RF
Random Forest
- RFU
relative fluorescence unit
- ROSE
Random Over Sampling Examples
- SARS-CoV-2
severe acute respiratory syndrome coronavirus 2
- SBM
structure (ToxPrint)-based model
- SMOTE
Synthetic Minority Over-sampling Technique
- SVM
support vector machine
- TN
true negative
- TOX
cytotoxicity assay
- TP
true positive
- WFS
Weighted Feature Significance
- XGboost
eXtreme gradient boosting
Footnotes
The authors declare no competing financial interest.
ASSOCIATED CONTENT
Supporting Information
Original drug repurposing assay data, features for the optimal models, model performances, experimental validation, experimental confirmation, and HPLC traces of lead compounds (PDF).
SMILES molecular formula strings (CSV).
REFERENCES
- (1).Yang Y; Peng F; Wang R; Guan K; Jiang T; Xu G; Sun J; Chang C The deadly coronaviruses: The 2003 SARS pandemic and the 2020 novel coronavirus epidemic in China. J. Autoimmun. 2020, 102434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Huang C; Wang Y; Li X; Ren L; Zhao J; Hu Y; Zhang L; Fan G; Xu J; Gu X; Cheng Z; Yu T; Xia J; Wei Y; Wu W; Xie X; Yin W; Li H; Liu M; Xiao Y; Gao H; Guo L; Xie J; Wang G; Jiang R; Gao Z; Jin Q; Wang J; Cao B Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet (London, England) 2020, 395, 497–506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (3).Rothan HA; Byrareddy SN The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak. J. Autoimmun. 2020, 109, 102433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (4).Lan J; Ge J; Yu J; Shan S; Zhou H; Fan S; Zhang Q; Shi X; Wang Q; Zhang L Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. Nature 2020, 581, 215–220. [DOI] [PubMed] [Google Scholar]
- (5).Yan R; Zhang Y; Li Y; Xia L; Guo Y; Zhou Q Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2. Science 2020, 367, 1444–1448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).V’kovski P; Kratzel A; Steiner S; Stalder H; Thiel V Coronavirus biology and replication: implications for SARS-CoV-2. Nat. Rev. Microbiol. 2020, 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (7).Hoffmann M; Kleine-Weber H; Schroeder S; Krüger N; Herrler T; Erichsen S; Schiergens TS; Herrler G; Wu N-H; Nitsche A SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell 2020, 181, 271–280.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (8).Jin Z; Du X; Xu Y; Deng Y; Liu M; Zhao Y; Zhang B; Li X; Zhang L; Peng C; Duan Y; Yu J; Wang L; Yang K; Liu F; Jiang R; Yang X; You T; Liu X; Yang X; Bai F; Liu H; Liu X; Guddat LW; Xu W; Xiao G; Qin C; Shi Z; Jiang H; Rao Z; Yang H Structure of M(pro) from SARS-CoV-2 and discovery of its inhibitors. Nature 2020, 582, 289–293. [DOI] [PubMed] [Google Scholar]
- (9).Chen CZ; Xu M; Pradhan M; Gorshkov K; Petersen JD; Straus MR; Zhu W; Shinn P; Guo H; Shen M; Klumpp-Thomas C; Michael SG; Zimmerberg J; Zheng W; Whittaker GR Identifying SARS-CoV-2 Entry Inhibitors through Drug Repurposing Screens of SARS-S and MERS-S Pseudotyped Particles. ACS Pharmacol. Transl. Sci 2020, 3(6), 1165–1175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Brimacombe KR; Zhao T; Eastman RT; Hu X; Wang K; Backus M; Baljinnyam B; Chen CZ; Chen L; Eicher T An OpenData portal to share COVID-19 drug repurposing data in real time. BioRxiv 2020. [Google Scholar]
- (11).Zhu W; Xu M; Chen CZ; Guo H; Shen M; Hu X; Shinn P; Klumpp-Thomas C; Michael SG; Zheng W Identification of SARS-CoV-2 3CL Protease Inhibitors by a Quantitative High-throughput Screening. ACS Pharmacol. Transl. Sci. 2020, 3(5), 1008–1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).Shoichet BK Virtual screening of chemical libraries. Nature 2004, 432, 862–865. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Gloriam DE Bigger is better in virtual drug screens. Nature 2019, 566, 193–194. [DOI] [PubMed] [Google Scholar]
- (14).Ekins S; Puhl AC; Zorn KM; Lane TR; Russo DP; Klein JJ; Hickey AJ; Clark AM Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater. 2019, 18, 435–441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Adeshina YO; Deeds EJ; Karanicolas J Machine learning classification can reduce false positives in structure-based virtual screening. Proc. Natl. Acad. Sci. U S A 2020, 117, 18477–18488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (16).Huang R; Xu M; Zhu H; Chen CZ; Zhu W; Lee EM; He S; Zhang L; Zhao J; Shamim K; Bougie D; Huang W; Xia M; Hall MD; Lo D; Simeonov A; Austin CP; Qiu X; Tang H; Zheng W Biological activity-based modeling identifies antiviral leads against SARS-CoV-2. Nat. Biotechnol. 2021, 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (17).Hansch C Quantitative approach to biochemical structure-activity relationships. Acc. Chem. Res. 1969, 2, 232–239. [Google Scholar]
- (18).Zhang R; Li X; Zhang X; Qin H; Xiao W Machine learning approaches for elucidating the biological effects of natural products. Nat. Prod. Rep. 2020, 38(2), 346–361. [DOI] [PubMed] [Google Scholar]
- (19).Huang R; Xia M; Sakamuru S; Zhao J; Shahane SA; Attene-Ramos M; Zhao T; Austin CP; Simeonov A Modelling the Tox21 10 K chemical profiles for in vivo toxicity prediction and mechanism characterization. Nat. Commun. 2016, 7, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (20).Huang R; Southall N; Wang Y; Yasgar A; Shinn P; Jadhav A; Nguyen DT; Austin CP The NCGC pharmaceutical collection: a comprehensive resource of clinically approved drugs enabling repurposing and chemical genomics. Sci. Transl. Med. 2011, 3, 80ps16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (21).Zhang ZR; Zhang YN; Li XD; Zhang HQ; Xiao SQ; Deng F; Yuan ZM; Ye HQ; Zhang B A cell-based large-scale screening of natural compounds for inhibitors of SARS-CoV-2. Signal. Transduct. Target. Ther. 2020, 5, 1–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (22).Chen CZ; Shinn P; Itkin Z; Eastman RT; Bostwick R; Rasmussen L; Huang R; Shen M; Hu X; Wilson KM; Brooks BM; Guo H; Zhao T; Klump-Thomas C; Simeonov A; Michael SG; Lo DC; Hall MD; Zheng W Drug Repurposing Screen for Compounds Inhibiting the Cytopathic Effect of SARS-CoV-2. Front. Pharmacol. 2021, 11, 592737–592737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (23).Rogers D; Hahn M Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010, 50, 742–54. [DOI] [PubMed] [Google Scholar]
- (24).Anderson GB; Oleson KW; Jones B; Peng RD Classifying heatwaves: Developing health-based models to predict high-mortality versus moderate United States heatwaves. Clim. Change 2018, 146, 439–453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (25).Lv Z; Jin S; Ding H; Zou Q A Random Forest Sub-Golgi Protein Classifier Optimized via Dipeptide and Amino Acid Composition Features. Front. Bioeng. Biotechnol. 2019, 7, 215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (26).Gong J; Liu J; Hao W; Nie S; Wang S; Peng W Computer-aided diagnosis of ground-glass opacity pulmonary nodules using radiomic features analysis. Phys. Med. Biol. 2019, 64, 135015. [DOI] [PubMed] [Google Scholar]
- (27).Xu T; Ngan DK; Ye L; Xia M; Xie HQ; Zhao B; Simeonov A; Huang R Predictive Models for Human Organ Toxicity Based on In Vitro Bioactivity Data and Chemical Structure. Chem. Res. Toxicol. 2020, 33, 731–741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (28).Kc GB; Bocci G; Verma S; Hassan MM; Holmes J; Yang JJ; Sirimulla S; Oprea TI A machine learning platform to estimate anti-SARS-CoV-2 activities. Nat. Mach. Intell. 2021, 3, 527–535. [Google Scholar]
- (29).Liu J; Mansouri K; Judson RS; Martin MT; Hong H; Chen M; Xu X; Thomas RS; Shah I Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure. Chem. Res. Toxicol. 2015, 28, 738–751. [DOI] [PubMed] [Google Scholar]
- (30).Vankadara S; Wong YX; Liu B; See YY; Tan LH; Tan QW; Wang G; Karuna R; Guo X; Tan ST; Fong JY; Joy J; Chia CSB A head-to-head comparison of the inhibitory activities of 15 peptidomimetic SARS-CoV-2 3CLpro inhibitors. Bioorg. Med. Chem. Lett. 2021, 48, 128263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Dampalla CS; Kim Y; Bickmeier N; Rathnayake AD; Nguyen HN; Zheng J; Kashipathy MM; Baird MA; Battaile KP; Lovell S; Perlman S; Chang KO; Groutas WC Structure-Guided Design of Conformationally Constrained Cyclohexane Inhibitors of Severe Acute Respiratory Syndrome Coronavirus-2 3CL Protease. J. Med. Chem. 2021, 64, 10047–10058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Bai B; Belovodskiy A; Hena M; Kandadai AS; Joyce MA; Saffran HA; Shields JA; Khan MB; Arutyunova E; Lu J; Bajwa SK; Hockman D; Fischer C; Lamer T; Vuong W; van Belkum MJ; Gu Z; Lin F; Du Y; Xu J; Rahim M; Young HS; Vederas JC; Tyrrell DL; Lemieux MJ; Nieman JA Peptidomimetic α-Acyloxymethylketone Warheads with Six-Membered Lactam P1 Glutamine Mimic: SARS-CoV-2 3CL Protease Inhibition, Coronavirus Antiviral Activity, and in Vitro Biological Stability. J. Med. Chem. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Zhang CH; Stone EA; Deshmukh M; Ippolito JA; Ghahremanpour MM; Tirado-Rives J; Spasov KA; Zhang S; Takeo Y; Kudalkar SN; Liang Z; Isaacs F; Lindenbach B; Miller SJ; Anderson KS; Jorgensen WL Potent Noncovalent Inhibitors of the Main Protease of SARS-CoV-2 from Molecular Sculpting of the Drug Perampanel Guided by Free Energy Perturbation Calculations. ACS Cent. Sci. 2021, 7, 467–475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).Ghosh AK; Raghavaiah J; Shahabi D; Yadav M; Anson BJ; Lendy EK; Hattori SI; Higashi-Kuwata N; Mitsuya H; Mesecar AD Indole Chloropyridinyl Ester-Derived SARS-CoV-2 3CLpro Inhibitors: Enzyme Inhibition, Antiviral Efficacy, Structure-Activity Relationship, and X-ray Structural Studies. J. Med. Chem. 2021, 64, 14702–14714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (35).Xia Z; Sacco M; Hu Y; Ma C; Meng X; Zhang F; Szeto T; Xiang Y; Chen Y; Wang J Rational Design of Hybrid SARS-CoV-2 Main Protease Inhibitors Guided by the Superimposed Cocrystal Structures with the Peptidomimetic Inhibitors GC-376, Telaprevir, and Boceprevir. ACS Pharmacol. Transl. Sci 2021, 4, 1408–1421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (36).Boras B; Jones RM; Anson BJ; Arenson D; Aschenbrenner L; Bakowski MA; Beutler N; Binder J; Chen E; Eng H; Hammond H; Hammond J; Haupt RE; Hoffman R; Kadar EP; Kania R; Kimoto E; Kirkpatrick MG; Lanyon L; Lendy EK; Lillis JR; Logue J; Luthra SA; Ma C; Mason SW; McGrath ME; Noell S; Obach RS; MN OB; O’Connor R; Ogilvie K; Owen D; Pettersson M; Reese MR; Rogers TF; Rosales R; Rossulek MI; Sathish JG; Shirai N; Steppan C; Ticehurst M; Updyke LW; Weston S; Zhu Y; White KM; García-Sastre A; Wang J; Chatterjee AK; Mesecar AD; Frieman MB; Anderson AS; Allerton C Preclinical characterization of an intravenous coronavirus 3CL protease inhibitor for the potential treatment of COVID19. Nat. Commun. 2021, 12, 6055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (37).Qiao J; Li YS; Zeng R; Liu FL; Luo RH; Huang C; Wang YF; Zhang J; Quan B; Shen C; Mao X; Liu X; Sun W; Yang W; Ni X; Wang K; Xu L; Duan ZL; Zou QC; Zhang HL; Qu W; Long YH; Li MH; Yang RC; Liu X; You J; Zhou Y; Yao R; Li WP; Liu JM; Chen P; Liu Y; Lin GF; Yang X; Zou J; Li L; Hu Y; Lu GW; Li WM; Wei YQ; Zheng YT; Lei J; Yang S SARS-CoV-2 M(pro) inhibitors with antiviral activity in a transgenic mouse model. Science 2021, 371, 1374–1378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (38).Csermely P; Agoston V; Pongor S The efficiency of multi-target drugs: the network approach might help drug design. Trends Pharmacol. Sci. 2005, 26, 178–82. [DOI] [PubMed] [Google Scholar]
- (39).Ramsay RR; Popovic-Nikolic MR; Nikolic K; Uliassi E; Bolognesi ML A perspective on multi-target drug discovery and design for complex diseases. Clin. Transl. Med. 2018, 7, 3–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (40).Chen N; Zhou M; Dong X; Qu J; Gong F; Han Y; Qiu Y; Wang J; Liu Y; Wei Y; Xia J; Yu T; Zhang X; Zhang L Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet (London, England) 2020, 395, 507–513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (41).Shyr ZA; Gorshkov K; Chen CZ; Zheng W <strong>Drug discovery strategies for SARS-CoV-2</strong>. J. Pharmacol. Exp. Ther. 2020, JPET-MR-2020–000123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (42).Xu T; Zheng W; Huang R High-throughput screening assays for SARS-CoV-2 drug development: current status and future directions. Drug Discov. Today 2021, 26:2439–2444 [DOI] [PMC free article] [PubMed] [Google Scholar]
- (43).Sun Q; Ye F; Liang H; Liu H; Li C; Lu R; Huang B; Zhao L; Tan W; Lai L Bardoxolone and bardoxolone methyl, two Nrf2 activators in clinical trials, inhibit SARS-CoV-2 replication and its 3C-like protease. Signal. Transduct. Target. Ther. 2021, 6, 1–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (44).Ohashi H; Watashi K; Saso W; Shionoya K; Iwanami S; Hirokawa T; Shirai T; Kanaya S; Ito Y; Kim KS Multidrug treatment with nelfinavir and cepharanthine against COVID-19. BioRxiv 2020. [Google Scholar]
- (45).Jeon S; Ko M; Lee J; Choi I; Byun SY; Park S; Shum D; Kim S Identification of Antiviral Drug Candidates against SARS-CoV-2 from FDA-Approved Drugs. Antimicrob. Agents Chemother. 2020, 64, e00819–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (46).Wilkinson T; Dixon R; Page C; Carroll M; Griffiths G; Ho L-P; De Soyza A; Felton T; Lewis KE; Phekoo K; Chalmers JD; Gordon A; McGarvey L; Doherty J; Read RC; Shankar-Hari M; Martinez-Alier N; O’Kelly M; Duncan G; Walles R; Sykes J; Summers C; Singh D; on behalf of the, A. C. ACCORD: A Multicentre, Seamless, Phase 2 Adaptive Randomisation Platform Study to Assess the Efficacy and Safety of Multiple Candidate Agents for the Treatment of COVID-19 in Hospitalised Patients: A structured summary of a study protocol for a randomised controlled trial. Trials 2020, 21, 691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (47).Bocci G; Bradfute SB; Ye C; Garcia MJ; Parvathareddy J; Reichard W; Surendranathan S; Bansal S; Bologa CG; Perkins DJ Virtual and In Vitro Antiviral Screening Revive Therapeutic Drugs for COVID-19. ACS Pharmacol. Transl. Sci. 2020, 3, 1278–1292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (48).Weston S; Coleman CM; Haupt R; Logue J; Matthews K; Li Y; Reyes HM; Weiss SR; Frieman MB Broad anti-coronavirus activity of Food and Drug Administration-approved drugs against SARS-CoV-2 in vitro and SARS-CoV in vivo. J. Virol. 2020, 94, e01218–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (49).Mengue JB; Held J; Kreidenweiss A AQ-13-an investigational antimalarial drug. Expert. Opin. Investig. Drugs 2019, 28, 217–222. [DOI] [PubMed] [Google Scholar]
- (50).Souroullas GP; Jeck WR; Parker JS; Simon JM; Liu J-Y; Paulk J; Xiong J; Clark KS; Fedoriw Y; Qi J; Burd CE; Bradner JE; Sharpless NE An oncogenic Ezh2 mutation induces tumors through global redistribution of histone 3 lysine 27 trimethylation. Nat. Med. 2016, 22, 632–640. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (51).Benchaoui HA; Cox SR; Schneider RP; Boucher JF; Clemence RG The pharmacokinetics of maropitant, a novel neurokinin type-1 receptor antagonist, in dogs. J. Vet. Pharmacol. Ther. 2007, 30, 336–44. [DOI] [PubMed] [Google Scholar]
- (52).Li JJ; Carson KG; Trivedi BK; Yue WS; Ye Q; Glynn RA; Miller SR; Connor DT; Roth BD; Luly JR; Low JE; Heilig DJ; Yang W; Qin S; Hunt S Synthesis and structure-activity relationship of 2-amino-3-heteroaryl-quinoxalines as non-peptide, small-molecule antagonists for interleukin-8 receptor. Bioorg. Med. Chem. 2003, 11, 3777–3790. [DOI] [PubMed] [Google Scholar]
- (53).Sander P; Mostafa H; Soboh A; Schneider JM; Pala A; Baron A-K; Moepps B; Wirtz CR; Georgieff M; Schneider M Vacquinol-1 inducible cell death in glioblastoma multiforme is counter regulated by TRPM7 activity induced by exogenous ATP. Oncotarget 2017, 8, 35124–35137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (54).Lin J; Zheng Z; Li Y; Yu W; Zhong W; Tian S; Zhao F; Ren X; Xiao J; Wang N; Liu S; Wang L; Sheng F; Chen Y; Jin C; Li S; Xia B A novel Bcl-XL inhibitor Z36 that induces autophagic cell death in Hela cells. Autophagy 2009, 5, 314–320. [DOI] [PubMed] [Google Scholar]
- (55).Inglese J; Auld DS; Jadhav A; Johnson RL; Simeonov A; Yasgar A; Zheng W; Austin CP Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries. Proc. Natl. Acad. Sci. U S A 2006, 103, 11473–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (56).Wang Y; Jadhav A; Southal N; Huang R; Nguyen DT A grid algorithm for high throughput fitting of dose-response curve data. Curr. Chem. Genomics 2010, 4, 57–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (57).Huang R A Quantitative High-Throughput Screening Data Analysis Pipeline for Activity Profiling. In High-Throughput Screening Assays in Toxicology, 1 ed.; Zhu H; Xia M, Eds. Humana Press: 2016; Vol. 1473. [Google Scholar]
- (58).Chen T; He T; Benesty M; Khotilovich V; Tang Y Xgboost: extreme gradient boosting. R Package Version 0.4–2 2015, 1–4. [Google Scholar]
- (59).Robin X; Turck N; Hainard A; Tiberti N; Lisacek F; Sanchez J-C; Müller M pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform. 2011, 12, 77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (60).Lunardon N; Menardi G; Torelli N ROSE: A Package for Binary Imbalanced Learning. R Journal 2014, 6. [Google Scholar]
- (61).Torgo L; Torgo ML Package ‘DMwR’. Comprehensive R Archive Network 2013. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
