Abstract
Motivation
Benchmarking is essential for the improvement and comparison of drug discovery platforms. We revised the protocols used to benchmark our Computational Analysis of Novel Drug Opportunities (CANDO) multiscale therapeutic discovery platform to bring them into strong alignment with best practices.
Results
CANDO ranked 7.4% and 12.1% of known drugs in the top 10 compounds for their respective diseases/indications using drug-indication mappings from the Comparative Toxicogenomics Database (CTD) and Therapeutic Targets Database (TTD), respectively. Performance was weakly positively correlated (Spearman correlation coefficient > 0.3) with the number of drugs associated with an indication and moderately correlated (coefficient > 0.5) with intra-indication chemical similarity. There was also a moderate correlation between performance on our original and new benchmarking protocols. Better performance was observed when using TTD instead of CTD when drug-indication associations appearing in both mappings were assessed.
Availability and implementation
CANDO is available at https://github.com/ram-compbio/CANDO. The version used in this article is available at http://compbio.buffalo.edu/data/mc_cando_benchmarking2.
Keywords: drug discovery, drug repurposing, proteomics, bioinformatics, benchmarking
1 Introduction
Drug discovery is difficult: according to a 2010 estimate, 24.3 early “target-to-hit” projects were completed per approved drug (Paul et al. 2010). These preclinical projects were estimated to account for between 31% and 43% of total drug discovery expenditure (Paul et al. 2010, DiMasi et al. 2016). The result is a high and increasing price for novel drug development, with estimates ranging from $985 million to over $2 billion for one new drug to be successfully brought to market (Mullard 2014, DiMasi et al. 2016, Wouters et al. 2020). The creation of more effective computational drug discovery platforms promises to reduce the failure rate and increase the cost-effectiveness of drug discovery (Zhang et al. 2022c, Sadybekov and Katritch 2023). Thousands of articles have been published on this topic, and multiple drugs discovered and/or optimized through computational methods are already in use (Talele et al. 2010, Shaker et al. 2021). Modern drug discovery techniques range from traditional single-target molecular docking and retrospective clinical analysis to newer signature matching, network/pathway mapping, and deep learning pipelines (Pushpakom et al. 2019, Zhu 2020, Galindez et al. 2021). The successes and failures of novel and repurposed therapeutics in combating the COVID-19 pandemic made more clear than ever that robust and effective drug discovery pipelines are essential for modern healthcare (Galindez et al. 2021, Muratov et al. 2021, Tayara et al. 2021, Li et al. 2023). Still, systems for the assessment and comparison of these computational platforms need improvement and standardization (Schuler et al. 2022).
For this study, we define a drug discovery platform as consisting of one or more pipelines, themselves comprising protocols (individual processes like predicting drug-target interactions), that come together to predict novel drug candidates for different diseases/indications. Benchmarking is the process of assessing the utility of such platforms, pipelines, and protocols (Peters et al. 2018, Weber et al. 2019). Quality benchmarking assists in (i) designing and refining computational pipelines; (ii) estimating the likelihood of success in practical predictions; and (iii) choosing the most suitable pipeline for a specific scenario. Guidelines, benchmarks, and head-to-head competitions have been published for topics including drug property, protein structure, and drug-target interaction prediction, but few manuscripts have explored benchmarking strategies for drug-indication association prediction platforms (Moult et al. 1995, Brown and Patel 2018, Wu et al. 2018, Meyer and Saez-Rodriguez 2021, Schuler et al. 2022, Schulman et al. 2024, Tanoli et al. 2025b). Key publications include those that created datasets that are used for benchmarking, such as Cdataset, PREDICT, and LRSSL (Gottlieb et al. 2011, Luo et al. 2016, Liang et al. 2017). Such static datasets are used instead of or alongside continuously updated databases like Drugbank, the Comparative Toxicogenomics Database (CTD), and Therapeutic Targets Database (TTD) (Wishart et al. 2006, Tanoli et al. 2021a, b, 2025a, Davis et al. 2023, Zhou et al. 2024). The limited availability of guidance on drug discovery benchmarking alongside an abundance of data sources for said benchmarking has resulted in the proliferation of numerous different benchmarking practices across different publications (Brown and Patel 2018, Tanoli et al. 2021a, b, 2025a, Schuler et al. 2022).
Most drug discovery benchmarking protocols start with a ground truth mapping of drugs to associated indications, though numerous “ground truths” are currently in use (Brown and Patel 2018). Data splitting is also frequently required. K-fold cross-validation is very commonly employed (Gottlieb et al. 2011, Wang et al. 2013a, Wang et al. 2014, Martinez et al. 2015, Luo et al. 2016, Liang et al. 2017, Luo et al. 2018, Vlietstra et al. 2018, Zhang et al. 2018a,b, Jiang et al. 2019, Wang et al. 2019a,b, Yang et al. 2019a,b, Zeng et al. 2019, Fahimian et al. 2020, Tang et al. 2021, Wang et al. 2020, Zhang et al. 2020, Zhou et al. 2020, Cai et al. 2021, Gao et al. 2022, Meng et al. 2021, Peng et al. 2021, Su et al. 2021, Yi et al. 2021, Yu et al. 2021, Zheng and Wu 2021, Wang et al. 2021b, Xie et al. 2021, Yang et al. 2021, Zhao et al. 2022a,b, Gu et al. 2022, Huang et al. 2022, John et al. 2022, Meng et al. 2022, Su et al. 2022, Sun et al. 2022, Yan et al. 2022, Zhang et al. 2022a,b, Yang and Chen 2022, Huang et al. 2023, Ianevski et al. 2024, Meng et al. 2024, Park and Cho 2024, Zhao et al. 2024). Training/testing splits, leave-one-out protocols, or “temporal splits” (splitting based on approval dates) are also used occasionally (Li et al. 2020, Mangione et al. 2020a, Rifaioglu et al. 2020, Crisan et al. 2021, John et al. 2022). Results are then encapsulated in various metrics (Schuler et al. 2022). Area under the receiver-operating characteristic curve and area under the precision–recall curve are commonly used (Gottlieb et al. 2011, Wang et al. 2013a,b, Cheng et al. 2014, Wang et al. 2014, Martinez et al. 2015, Luo et al. 2016, Liang et al. 2017, Luo et al. 2018, Vlietstra et al. 2018, Zhang et al. 2018a,b, Jiang et al. 2019, Wang et al. 2019a,b, Yang et al. 2019a,b, Zeng et al. 2019, Fahimian et al. 2020, Lin et al. 2020, Rifaioglu et al. 2020, Tang et al. 2021, Wang et al. 2020, Zhang et al. 2020, Zhou et al. 2020, Zhu et al. 2020, Cai et al. 2021, Crisan et al. 2021, Fiscon et al. 2021, Gao et al. 2022, Han et al. 2021, Lin et al. 2021, Meng et al. 2021, Peng et al. 2021, Su et al. 2021, Yi et al. 2021, Wang et al. 2021b, Xie et al. 2021, Yang et al. 2021, Yu et al. 2021, Zheng and Wu 2021, Guala and Sonnhammer 2022, Gu et al. 2022, John et al. 2022, Huang et al. 2022, Meng et al. 2022, Su et al. 2022, Sun et al. 2022, Wu et al. 2022a,b, Yang and Chen 2022, Yang et al. 2022, Zhang et al. 2022a,b, Zhao et al. 2022a,b, Huang et al. 2023, Kang et al. 2023, Zong et al. 2025, Madushanka et al. 2024, Meng et al. 2024, Park and Cho 2024, Zhao et al. 2024). However, their relevance to drug discovery has been questioned (Cheng et al. 2013, Lin et al. 2020, Schuler et al. 2022). Interpretable metrics like recall, precision, and accuracy above a threshold are also frequently reported (Zhang and Gant 2008, Gottlieb et al. 2011, Wang et al. 2013a, Wang et al. 2014, Liang et al. 2017, Luo et al. 2018, Varsou et al. 2018, Yu et al. 2018, Zhang et al. 2018a,b, Zhou et al. 2020, Crisan et al. 2021, Lucchetta and Pellegrini 2021, Peng et al. 2021, Yu et al. 2021, Guala and Sonnhammer 2022, Su et al. 2022, Wu et al. 2022b, Huang et al. 2023, Zhao et al. 2024). Case studies often appear alongside quantitative assessments to provide a tangible confirmation of predictive power (Chiang and Butte 2009, Hu and Agarwal 2009, Wang et al. 2013a, Wang et al. 2014, Martinez et al. 2015, Liang et al. 2017, Luo et al. 2018, Zhang et al. 2018a, Jiang et al. 2019, Yang et al. 2019a, Yu and Gao 2019, Wang et al. 2019a,b, Zeng et al. 2019, Lin et al. 2020, Tang et al. 2021, Wang et al. 2020, Zhou et al. 2020, Cai et al. 2021, Crisan et al. 2021, Fiscon et al. 2021, Gao et al. 2022, Han et al. 2021, Meng et al. 2021, Peng et al. 2021, Su et al. 2021, Xie et al. 2021, Yang et al. 2021, Yi et al. 2021, Yu et al. 2021, Wang et al. 2021a, Ciriaco et al. 2022, John et al. 2022, Meng et al. 2022, Su et al. 2022, Sun et al. 2022, Wu et al. 2022a,b, Wu et al. 2023, Zhang et al. 2022b, Zhao et al. 2022a,b, Huang et al. 2023, Kang et al. 2023, Madushanka et al. 2024, Huang et al. 2024, Ianevski et al. 2024, Meng et al. 2024, Park and Cho 2024, Shen et al. 2024, Zhao et al. 2024). This wide variation in benchmarking can make determining best practices for assessing a given drug discovery protocol difficult, but ensuring a maximally informative and minimally biased assessment remains a necessity.
We developed the Computational Analysis of Novel Drug Opportunities (CANDO) platform based on the hypothesis that drugs with similar multitarget protein interaction profiles have similar biological effects (Minie et al. 2014, Sethi et al. 2015, Chopra and Samudrala 2016, Chopra et al. 2016, Falls et al. 2019, Fine et al. 2019, Mangione and Samudrala 2019, Schuler and Samudrala 2019, Mangione et al. 2020a,b, Hudson and Samudrala 2021, Overhoff et al. 2021, Mammen et al. 2022, Mangione et al. 2022, Moukheiber et al. 2022, Schuler et al. 2022, Bruggemann et al. 2023, Mangione et al. 2023). CANDO calculates all-against-all interaction signature similarities to predict drug candidates, and it has been extensively validated (Jenwitheesuk and Samudrala 2005, Jenwitheesuk et al. 2008, Costin et al. 2010, Nicholson et al. 2011, Michael et al. 2013, 2014, Minie et al. 2014, Chopra et al. 2016, Fine et al. 2020, Mangione et al. 2020a,b, Chatrikhi et al. 2021, Falls et al. 2022, Palanikumar et al. 2021, Mammen et al. 2022, Mangione et al. 2022, Schuler et al. 2022, Bruggemann et al. 2023). Previous efforts to benchmark CANDO have focused on assessing its ability to generate useful drug–drug similarity lists, which are used to generate novel drug predictions (Mangione and Samudrala 2019, Schuler and Samudrala 2019, Mangione et al. 2020a, Schuler et al. 2022).
Our goal for this study is to bring the benchmarking protocols of our drug discovery platform into strong alignment with best practices. We thus updated our internal benchmarking protocol to evaluate the predictions generate by CANDO based on a consensus of the previously-assessed similarity lists. This revision allowed us to optimize parameters used in CANDO and examine the influence of certain features on its performance. Utilizing the updated protocols and parameters thus created will result in improved performance.
2 Materials and methods
2.1 Drug discovery using CANDO
The CANDO multiscale drug discovery platform predicts novel compounds for diseases/indications based on their multitarget interaction signatures. It has been described extensively in other publications (Minie et al. 2014, Chopra and Samudrala 2016, Falls et al. 2019, Mangione and Samudrala 2019, Schuler and Samudrala 2019, Mangione et al. 2020a,b, Hudson and Samudrala 2021, Schuler et al. 2022, Mangione et al. 2023). CANDO consists of multiple pipelines designed for different prediction scenarios; some require human prioritization of drug targets, like our multitarget screening pipeline, and others require that the indication being predicted for is associated with at least one drug, like the primary pipeline of CANDO. We assessed this latter pipeline in this article. In the primary pipeline, the interaction signatures of every compound are compared to those of every other compound under the hypothesis that compounds with similar signatures exhibit similar behaviors. Each compound is thus associated with a sorted “similarity list” containing every other compound ranked by signature similarity. We calculated compound-compound signature similarity as the root mean squared distance between their proteomic interaction signatures (vectors of compound-protein scores) in this study using rapid, parallelizable algorithms from scikit-learn (Pedregosa et al. 2011, Mangione et al. 2020a).
CANDO combines multiple similarity lists into novel drug predictions for an indication via a consensus protocol: (i) The similarity lists of drugs associated with the indication are examined. (ii) The most similar compounds to each associated drug are ranked. (iii) A consensus score is assigned to each compound based on the number of similar lists in which it appears above a certain rank (the similarity list cutoff). Average rank in those lists is also calculated. (iv) Compounds are sorted by consensus score and average rank above the similarity list cutoff. The best ranked compounds in this consensus list are the top predictions for an indication. This pipeline is summarized in Fig. 1.
Figure 1.
Flowchart of CANDO prediction and benchmarking pipelines. The primary prediction pipeline of CANDO is shown. Data sources are represented by their respective logos. COACH is used to predict protein binding sites based off of experimental structures from the Protein Data Bank (PDB) and/or computational models created via I-TASSER (Berman et al. 2000, Zhang 2008, Xu et al. 2011, Yang et al. 2015). Predicted ligands and confidence scores for each binding site are combined with compound fingerprints (from RDKit) to calculate protein interaction scores for every small molecule in the compound library (from DrugBank) using the bioanalytic docking (BANDOCK) protocol (Wishart et al. 2006, Landrum 2010). These interaction scores are arranged into interaction signatures for every compound. Drug–drug signature similarity scores are calculated from these signatures. Drug-indication mappings are extracted from the CTD and/or TTD, and the most similar compounds to each drug associated with an indication are examined. Our original benchmarking protocol assessed the resulting similarity lists. Novel compound predictions are generated and ranked based on the number of times a compound appears in these lists above the similarity list cutoff; ties are broken based on average rank in these lists. In the example, the yellow compound is first because it appears the most times, and the cyan compound is second because its average rank is better than that of the magenta compound. The new benchmarking protocol uses the same scoring as the prediction pipeline, but each compound is left out in turn from its respective indication(s). The dark blue compound is left out of this example and then predicted with a rank of two based on the remaining indicated drugs. The original and new benchmarking protocols differ in what is assessed: the original focuses on the individual similarity lists, whereas the new evaluates the final consensus list.
2.2 Data extraction and generation
Proteomic interaction signatures were created based on the CANDO version 2.5 compound and human protein libraries. The protein library comprised 8385 nonredundant human protein structures, including 5316 experimentally determined structures extracted from the Protein Data Bank and 3069 models generated using I-TASSER version 5.1 (Berman et al. 2000, Xu et al. 2011, Zhang 2008, Yang et al. 2015, Mangione et al. 2023). Our bioanalytic docking (BANDOCK) protocol uses binding site data to calculate compound–protein interaction scores. We used the COACH pipeline to generate these data for our protein library (Yang et al. 2013). COACH compared potential binding sites with solved bound protein structures to calculate binding site similarity scores and likely ligands (Yang et al. 2013, Mangione et al. 2022). The chemical similarity between each compound in our library and the most similar predicted ligand was calculated using Extended Connectivity Fingerprints with a diameter of 4 (ECFP4) generated by RDKit (https://rdkit.org/; Landrum 2010, Schuler and Samudrala 2019, Mangione et al. 2022). Compound–protein interaction scores were then calculated in three ways: (i) as the chemical similarity score (the compound-only or C score), (ii) as the product of the chemical similarity score and the binding site similarity score (the compound-and-protein or CxP score), or (iii) as the product of the percentile chemical similarity score and the protein binding score (the percentile compound-and-protein or dCxP score). We compared all three interaction scoring types in our parameter optimization study (Section 2.4); the second score was used for our predictive power assessment.
Benchmarking requires known drug-indication mappings, which we obtained from two sources. We combined drug approval data from DrugBank and drug-indication associations from the CTD to make the “CTD mapping,” which is also available in version 2.5 of CANDO (Wishart et al. 2006, Davis et al. 2023). The “TTD mapping” was created from approved drug-indication associations downloaded from the TTD; only compounds with existing interactions signatures were included (Zhou et al. 2024). In total, there were: 2449 approved drugs across 2257 indications with at least one associated drug and 22 771 associations in the CTD drug-indication mapping; 1810 drugs across 535 indications and 1977 associations in the TTD mapping; and 2739 unique drugs altogether. Of these indications, 1595 were associated with at least two drugs and thus could be benchmarked in CTD, and 249 were associated with at least two drugs in TTD.
2.3 Benchmarking CANDO
Our original benchmarking protocol examined the similarity lists of each indication-associated drug (Minie et al. 2014, Sethi et al. 2015, Chopra and Samudrala 2016, Chopra et al. 2016, Falls et al. 2019, Fine et al. 2019, Mangione and Samudrala 2019, Schuler and Samudrala 2019, Mangione et al. 2020a,b, Hudson and Samudrala 2021, Overhoff et al. 2021, Mammen et al. 2022, Moukheiber et al. 2022, Mangione et al. 2022, Schuler et al. 2022, Bruggemann et al. 2023, Mangione et al. 2023). Indication accuracy (IA) was calculated as the percentage of these similarity lists in which at least one associated drug appeared above a cutoff. Indication accuracies were then averaged for every assessed indication to obtain average indication accuracy (AIA).
Our new benchmarking protocol uses the Python modules polars and sqlalchemy to rapidly evaluate the consensus scoring protocol and more accurately measure the performance of CANDO (Bayer 2012, Vink et al. 2024). To assess an indication, each associated drug is withheld from the indication in turn. Compounds are ranked as in the consensus scoring protocol (Section 2.1). Two additional tiebreakers are used to order compounds that fall outside of the similarity list cutoff: (i) average rank across the full similarity lists of indication-associated drugs and (ii) average similarity to associated drugs. The rank of the withheld drug in the final sorted list is determined and used to calculate multiple metrics.
Our new protocol calculates two primary metrics. New indication accuracy (nIA) is calculated as the percentage of withheld drugs that are predicted at or above a defined rank cutoff in the consensus list. We chose rank cutoffs of 10, 25, and 100 for this study. nIA is averaged across all assessed indications to calculate new average indication accuracy (nAIA). Second, normalized discounted cumulative gain (NDCG) prioritizes early discovery of true positives (Schuler et al. 2022). We calculate discounted cumulative gain as follows:
This is divided by the ideal discounted cumulative gain (here, equal to one) to obtain NDCG. Greater NDCGs are better. This metric will be referred to as new NDCG (nNDCG) when calculated by our new protocol. We determined nNDCG without a rank cutoff (overall) and at rank cutoffs of 10, 25, and 100 in this study.
Statistical significance between benchmarking results was calculated with a two-sided Wilcoxon signed-rank test in scipy using either the ranks at which withheld drugs were predicted between matched drug-indication associations or performance on nIA between matched indications (Virtanen et al. 2020). Significance against a random control was calculated in scipy as a one-sided binomial test in which the number of successes (a withheld drug being ranked within the corresponding cutoff) across all associations was compared to the random control probability of success calculated from the hypergeometric distribution (Virtanen et al. 2020).
2.4 Optimizing parameters
We optimized two CANDO parameters with regards to performance on nAIA and nNDCG. We randomly split the indications in our drug-indication mappings 30 to 70 to create independent mappings for parameter optimization and performance evaluation, limiting the risk of bias in our final assessment. All associations with the same indication were assigned to the same group, and only indications with at least two associated drugs were assessed. The CTD mapping was split into 5714 drug-indication associations across 501 indications for parameter optimization and 13 226 associations across 1094 indications for the final assessment. The smaller TTD mapping was split into 490 associations across 82 indications for parameter optimization and 1160 associations across 167 indications for the final assessment.
The first parameter optimized was the similarity list cutoff used in our consensus scoring protocol (Section 2.1). We quantified performance using nAIA and nNDCG on the CTD and TTD drug-indication mappings for every value of this parameter up to the number of approved drugs in the mapping (2449 for CTD and 1810 for TTD). The optimal parameter was the similarity list cutoff used when each metric reached its maximum. A random control was calculated for each metric and mapping at each optimal parameter. A hypergeometric distribution was used to calculate the control nAIA. For nNDCG, 10 randomized drug–protein interaction matrices were generated and benchmarked per optimal value and mapping, and the nNDCGs were averaged.
The second parameter optimized was the compound–protein interaction scoring type. We benchmarked CANDO using proteomic interaction matrices generated using each of the three BANDOCK scoring types (Section 2.2) with similarity cutoffs ranging from 1 to 100. We compared the best performances of each scoring type using nAIA and nNDCG.
2.5 Evaluating performance
A final assessment was completed using the 70% of indications not used for parameter optimization. Similarity list cutoffs of 6, 10, and 13 and the compound-and-protein interaction score were used based on the parameter optimization results of both mappings (Section 2.4). We calculated nAIA and nNDCG at rank cutoffs of 10, 25, and 100, in addition to overall nNDCG, in this final assessment.
We examined how three features correlated with performance, as quantified by nIA: the number of drugs associated with an indication, similarity list quality as quantified by our previous benchmarking metric (IA), and the chemical similarity of indication-associated drugs. Spearman correlation was used as our metrics do not follow a normal distribution, violating the assumptions of Pearson correlation. Correlation coefficients were calculated using the scipy package (Virtanen et al. 2020). IA was generated by our original benchmarking protocol (Section 2.3; Mangione et al. 2020a). Drug-drug chemical signature similarity was calculated using the Tanimoto coefficient on 2048-bit ECFP4 vectors, which encode the chemical features of a compound. ECFP4 fingerprints were generated by RDkit (Tanimoto 1957, Landrum 2010). Three similarity metrics were calculated for each indication: best similarity between any pair of associated drugs, average of the best similarities of each associated drug, and average of the average similarities of each associated drug. The correlation coefficients between each of these three factors were also calculated to determine how they correspond to one another, and a multiple linear regression model was trained on the normalized factors to measure the relative extents of their relationship with nIA.
2.6 Comparing drug-indication mappings
We compared benchmarking performance when using the mappings extracted from CTD and TTD. We combined the drugs from both mappings into a single library and benchmarked CANDO using this library and each full mapping. We manually matched each TTD indication to the most similar CTD indication. TTD indications were excluded if no appropriate CTD indication existed or if the most appropriate CTD indication was already matched to a more similar TTD indication. Performance on each mapping, quantified as nAIA, was compared for the matched indications. Finally, we compared the rankings of the drugs that were associated with the same indications in both CTD and TTD.
3 Results and discussion
In this study, we created two new benchmarking protocols to allow more consistent assessment of CANDO and computational drug discovery platforms in general. We present results obtained via these new protocols, including (i) the optimization of key parameters involved in our primary prediction pipeline; (ii) an assessment of CANDO using these optimized parameters, including the correlations between performance on the new benchmarking protocol and the number of drugs associated with a disease/indication, performance on our original benchmarking protocol, and the drug–drug chemical signature similarity within an indication; and (iii) a comparison of performance using two different drug-indication mappings.
3.1 Optimization of two key CANDO parameters
Our new internal benchmarking protocol allows us to directly assess the performance of the consensus scoring protocol CANDO uses to rank potential therapeutics (Section 2.3). This allowed us to optimize a key parameter of this protocol. CANDO uses a similarity list cutoff to determine how many similar compounds are considered per indication-associated drug (Section 2.1). Various similarity list cutoffs have been used in previous applications of CANDO (Chopra et al. 2016, Fine et al. 2019, Mangione et al. 2020b, Mammen et al. 2022, Moukheiber et al. 2022, Bruggemann et al. 2023).
We benchmarked the performance of CANDO at every possible similarity list cutoff on subsets of drug-indication mappings extracted from the CTD and TTD. Results were quantified using two metrics: nAIA and new normalized discounted cumulative gain (nNDCG). nAIA represents the average percentage of drugs correctly predicted for all indications above a given cutoff; it is simple and interpretable, with clear applicability to practical performance. nNDCG prioritizes early discovery of true positives, making it particularly relevant to applications like drug discovery in which highly ranking excellent candidates may be more important than accurately ordering later-ranked candidates (Schuler et al. 2022). We report these metrics at top10, top25, and top100 rank cutoffs; nNDCG was also calculated without a rank cutoff. The results for similarity list cutoffs up to 1810 are shown in Fig. 2.
Figure 2.
Effects of the similarity list cutoff on benchmarking performance. We used our new benchmarking protocol to optimize the similarity list cutoff, which represents the number of similar compounds the consensus protocol considers per associated drug when predicting a new compound for an indication. Assessments were completed on two drug-indication mappings extracted from the CTD and TTD. Results were summarized using nAIA and nNDCG metrics at multiple rank cutoffs; nNDCG was also calculated without a rank cutoff. Performance using nAIA (A) and nNDCG (B) is shown for similarity list cutoffs up to 1810. Dotted vertical lines indicate the cutoffs of 1 and 50, between which all optimal values for this parameter fall. An expanded graph of only this range is shown for nAIA (C) and nNDCG (D). Bar charts (E and F) show the maximum values of each metric against random controls. Optimal values are marked with a triangle and listed in the tables (G) at the bottom. The optimal parameter values for nAIA varied from 6 to 31 based on the cutoff and mapping used. The range was smaller for nNDCG, ranging from 7 to 13. The similarity list cutoff affected performance on multiple key metrics, and optimal performance was only achieved when <2% of compounds were considered.
Performance varied based on the similarity list cutoff used. The largest range was observed in the CTD mapping using nAIA top100, which ranged from 9.2% (similarity list cutoff of 805) to 21.4% (cutoff of 31). CANDO outperformed the random control across all metrics and similarity list cutoffs. Optimal parameter values ranged from 6 (nAIA top10 using CTD and nAIA top25 using TTD) to 31 (nAIA top100 using CTD). Performance was better on all metrics and cutoffs when using the TTD mapping instead of the CTD mapping. The difference between the nIAs at a cutoff of 10 (one maximum) and the minimum nIAs for each mapping were statistically significant (P-value ) at the top10, top25, and top100 cutoffs.
We also assessed the effect of the protocol used to calculate drug-protein interaction scores. Our BANDOCK interaction scoring protocol computes three types of interaction scores: compound-only, compound-and-protein, and percentile compound-and-protein (Section 2: Section 2.2). We optimized the similarity list cutoff using nAIA and nNDCG with all three interaction scoring types; the best performances of each scoring type on each metric were compared (Table 1, available as supplementary data at Bioinformatics online). The compound-and-protein score showed the best performance on most metrics and the percentile compound-and-protein score performed best on the remainder when using the CTD mapping. The compound-only score performed best on most metrics when using the TTD mapping. None of these differences, however, were statistically significant (P-value ) when optimal performances were compared. We used the compound-and-protein score for the remaining trials as it was never the worst performing score, though there is insufficient evidence to conclude that the score used affected performance.
3.2 Assessment of predictive power
We conducted a second assessment to determine the predictive power of CANDO using our optimized parameters. We conducted three assessments on the drug-indication associations not used for optimization using the compound-and-protein interaction score and similarity list cutoffs of six, ten, and thirteen, chosen based on our optimization trials. The results are shown in Fig. 3A and B.
Figure 3.
Assessment of predictive power. CANDO was assessed using the protocols and parameters obtained through our optimization. nAIA (A) and nNDCG (B) metrics are shown at multiple cutoffs for the two drug-indication mappings, CTD and TTD. The random control is shown as a dotted line on each group of bars with the same mapping and cutoff. CANDO outperformed the control on all assessments, and performance was best when using the TTD mapping. Performance on this assessment was correlated with multiple features: the number of compounds in an indication (C and D); our original indication accuracy (IA) metric, which measures similarity list quality (E and F); and drug–drug chemical signature similarity within each indication, measured as the average Tanimoto coefficient between the chemical fingerprint of each drug and that of its most similar other associated drug (G and H). The upper subfigures (C, E, and G) plot each feature against nIA top100 in CTD, whereas the lower plots (D, F, and H) show the relationship between the same two features when their values are ranked; these ranks were used to calculate Spearman correlation coefficients. The size of the bubble surrounding each dot represents the number of indications plotted there. Trendlines are shown as dotted black lines. Positive correlations of varying strength were observed in all cases. Knowledge of the features influencing benchmarking can enable more accurate assessment of expected predictive performance. Based on these results, we can expect CANDO to perform best when predicting compounds for indications with large numbers of associated drugs and high chemical signature similarity.
CANDO outperformed random controls using both drug-indication mappings and all metrics. nAIA results showed that CANDO recovered 7.3% to 7.4% of approved drugs within the top 10 compounds using the CTD mapping and 11.4% to 12.1% using the TTD mapping (out of 2449 compounds in CTD and 1810 in TTD). This rose to 19.0% to 21.1% using CTD and 29.9% to 31.0% using TTD at the top 100 cutoff. nNDCG top10 ranged from 0.038 to 0.040 using CTD and 0.064 to 0.068 using TTD. Performance using both mappings and at all cutoffs was better than the corresponding random control values (P-value ). We also repeated these calculations on only indications with three or more associated drugs; performance was similar or slightly improved. These results, as well as data for all metrics and similarity list cutoffs, are available in Table 2, available as supplementary data at Bioinformatics online.
3.2.1 Correlation with the number of associated drugs
We investigated the relationships between three features of indications and the performance of CANDO in order to determine what types of indications CANDO performs best on. First, we considered how the number of drugs associated with an indication (indication size) relates to performance.
Greater data availability generally corresponds with improved performance of computational models, so we anticipated a positive correlation between nIA and indication size. There was indeed a weak positive correlation, with Spearman correlation coefficients ranging from 0.324 to 0.352 using the CTD mapping and from 0.337 to 0.505 using the TTD mapping. The correlation between nIA at the top100 cutoff and indication size using the CTD mapping is illustrated in Fig. 3C and D. Correlations between nIA at all cutoffs are shown in Fig. 1, available as supplementary data at Bioinformatics online.
We wondered if the large number of indications with few associated drugs and an nIA of zero may influence this apparent positive correlation. We thus re-calculated the correlation coefficient using only indications with five or more associated drugs (Table 3A, available as supplementary data at Bioinformatics online). The strength of the correlation between nIA and indication size weakened in this assessment, becoming negligible at 0.007 to 0.071 using CTD and shrinking to 0.057 to 0.252 using TTD. This suggests that indication size has a limited relationship with performance for larger indications. This could be influenced by the presence of more varied mechanisms of action and disease subtypes (e.g. HER2-positive or triple negative breast cancer) in indications with more associated drugs, which could decrease the similarity of associated drugs.
3.2.2 Correlation with similarity list quality
We also considered the primary metric of our previous internal benchmarking protocol: indication accuracy (IA; Section 2.3). IA directly measures the quality of the drug–drug interaction signature similarity ranks calculated by CANDO. IA is more lenient and thus tends to be higher than nIA: IA checks whether at least one other associated drug appears above a certain cutoff rather than the percentage of associated drugs recalled. We also considered the primary metric of our previous internal benchmarking protocol: indication accuracy (IA; Section 2.3). IA directly measures the quality of the drug–drug interaction signature similarity ranks calculated by CANDO. IA is more lenient and thus tends to be higher than nIA: IA checks whether at least one other associated drug appears above a certain cutoff rather than the percentage of associated drugs recalled.
We found a strong relationship between nIA and IA at the top10, 25, and 100 cutoffs for each indication, with Spearman correlation coefficients ranging from 0.741 to 0.807 using the CTD mapping and 0.859 to 0.905 using the TTD mapping. No relationship was observed between the cutoff considered and the strength of the correlation. The correlation between nIA and IA at the top100 cutoff using the CTD mapping is illustrated in Fig. 3C and D, and the remaining correlations are illustrated in Fig. 2, available as supplementary data at Bioinformatics online. The correspondence between IA and nIA suggests that our previous benchmarking results did have relevance to actual performance, as has also been demonstrated by extensive prospective validation (Jenwitheesuk and Samudrala 2005, Jenwitheesuk et al. 2008, Costin et al. 2010, Nicholson et al. 2011, Michael et al. 2013, 2014, Minie et al. 2014, Chopra et al. 2016, Fine et al. 2020, Mangione et al. 2020b, Chatrikhi et al. 2021, Falls et al. 2022, Palanikumar et al. 2021, Schuler et al. 2022, Mammen et al. 2022, Mangione et al. 2022, Bruggemann et al. 2023). This also suggests a correlation between high-quality similarity lists and high-quality consensus predictions.
3.2.3 Correlation with chemical signature similarity
CANDO typically uses proteomic interaction signature similarity between indication-associated drugs and other compounds to predict whether those compounds would be effective for that indication. Compounds that are alike in chemical structure tend to have similar protein interactions and, therefore, similar interaction signatures (Schuler and Samudrala 2019). Note that the use of only approved drugs confounds this assessment due to the relatively high presence of “me-too” drugs compared to small molecules as a whole.
We assessed the extent to which chemical similarity, calculated as the Tanimoto coefficient between the chemical fingerprints of two drugs, correlates with the performance of CANDO (Tanimoto 1957). We quantified chemical similarity within an indication through three metrics: maximum similarity between any pair of associated drugs; average similarity across all pairs of associated drugs; and the average of the maximum similarities of each associated drug. The correlation was strongest using average maximum similarity, for which coefficients ranged from 0.635 to 0.750 using CTD and 0.697 to 0.744 using TTD (Table 3C, available as supplementary data at Bioinformatics online). The correlation between nIA top100 and average maximum similarity using the CTD mapping is illustrated in Fig. 3G and H, and correlation with nIA top10, top25, and top100 using both mappings are shown in Fig. 3, available as supplementary data at Bioinformatics online. Though the correlation was moderate-to-strong, the fact that the correlation was greatest when using average maximum similarity suggests that CANDO performs best on indications in which most drugs have at least one chemically similar partner; the relationship with a measure of overall chemical homogeneity (average similarity) is not as strong. This is supported by our previous work, which demonstrated that CANDO performs better when using the protein interaction signature than chemical similarity (aka molecular fingerprints), showing that the platform is not restricted to predicting compounds based on chemical similarity alone (Schuler and Samudrala 2019, Mangione et al. 2023). CANDO has also demonstrated its ability to design and/or select compounds with novel chemical spaces when the appropriate inputs are used (Overhoff et al. 2021).
The relationships between the independent variables examined herein were also examined. All three were moderately-to-strongly correlated (coefficient > 0.4), with one exception. The correlation between the number of drugs in an indication and average similarity was weak to nonexistent (0 < coefficient < 0.2; Table 2D, available as supplementary data at Bioinformatics online). We also examined the relative predictive power of these three variables to nIA by normalizing them and training a multiple linear regression model on the normalized values. IA had the greatest weight/coefficient when all three variables were used, and, interestingly, the number of compounds in an indication was generally given a negative coefficient. We repeated this trial excluding IA as its large coefficient overpowered the other two factors. All three similarity metrics were given larger coefficients than indication size in this scenario. Indication size had a negative coefficient when used in a model with maximum similarity or average maximum similarity, but zero-to-positive when used with average similarity. This suggests that a high IA (i.e. high similarity list quality) is the strongest predictor of whether CANDO will perform well on an indication. The number of drugs and chemical similarity within an indication are, in turn, correlated with IA. The previously observed positive correlation between the number of drugs in an indication and performance may additionally be a function of its relationship with IA and certain similarity metrics, as a negative coefficient was calculated for this factor when all three were used together to predict performance.
3.3 Comparison of drug-indication mappings
CANDO consistently performed better when using the drug-indication mapping created from the TTD relative to the one from the CTD. There are a few reasons why this could occur. First, CTD and TTD contain different indications. If the indications in TTD are easier to predict drugs for than those in CTD, this could result in better benchmarking performance when using TTD. Second, CTD has more approved drugs in total, which means that each assessed drug has to outcompete more total compounds during benchmarking, decreasing performance. Finally, the TTD mapping has a higher standard of evidence for inclusion (FDA approval) than CTD (therapeutic association in the literature). This could lead to higher data quality and, thus, better performance.
We examined these drug-indication mappings head-to-head to determine whether using TTD actually improved performance. We benchmarked CANDO using both full mappings and the full library of drugs that are approved in either mapping. CANDO still performed better when using the TTD mapping, with a top10 nAIA of 6.8% using CTD compared to 11.3% using TTD. However, this difference decreased when only the 191 indications that appeared in both mappings were assessed. The top10 nAIA using CTD with matched indications was 6.5% compared to 9.3% using TTD. The differences in nIA top10 and top100 on each matched indication are shown in Fig. 4A and B. Note that the CTD mapping performed better on more indications than the TTD, but the TTD generally outperformed by a greater magnitude.
Figure 4.
Influence of drug-indication mapping on performance. CANDO was benchmarked using drug-indication mappings extracted from two databases, CTD and TTD. The differences in nIA at the top10 (A) and top100 (B) cutoffs for each indication that appeared in both mappings are shown. Performance was better using CTD for more indications, but TTD outperformed by more when it was superior. This lead to a higher overall nAIA when using TTD. We also compared the ranks of 576 drug-indication associations that appeared in both mappings. The ranks of those drugs when predicted for their indications in each mapping are plotted in log scale (C) and arithmetic scale (D). Black lines indicate the 100th rank, beyond which predictions are less likely to be useful for drug discovery, and a grey line represents equivalent ranks between the mappings. The number of associations for which each mapping performed better is shown (E); these counts are separated by whether both, only one, or neither mappings ranked the drug within the top100 cutoff. A drug was more likely to rank well for its indication when using the TTD drug-indication mapping, but more individual indications performed better at the most stringent top10 cutoff when using the CTD mapping.
This trend was reversed when we grouped matched indications by the top-level MeSH headings they were associated with; TTD performed better on top10, top25, and top100 nAIA on 14, 13, and 15 out of 24 top-level MeSH headings, as compared to 7, 10, and 8 for CTD. Among those top-level groups containing at least 10 indications, TTD outperformed CTD by the most on “Endocrine System Diseases” (12 indications, top100 nAIA of 26.9% using CTD versus 46.3% using TTD), and CTD outperformed TTD by the most on “Infections” (32 indications, top100 nAIA of 37.6% using CTD versus 23.5% using TTD). The full results for this analysis are available in Table 4, available as supplementary data at Bioinformatics online.
TTD still performed better when only matched indications were considered; however, this difference was not statistically significant. The top10 nAIA was 21.5% greater when the full TTD mapping was used compared to only those indications matched with CTD indications. Meanwhile, there was only a 4.6% increase when the full CTD mapping was used over the matched indications. This indicates that the apparent better performance when using TTD may in part be due to its inclusion of “easier” indications that CANDO performs better on. Indications that were exclusively benchmarked using TTD with high nIAs include anesthesia (ICD-11 9a78.6; 33 drugs, 15.1% top10 nIA), contraception (ICD-11 qa21; 10 drugs, 30% top10 nIA) and virus infection (ICD-11 1a24-1d9z; 8 drugs, 50% top10 nIA).
We next examined the 576 drug-indication associations that appeared in both CTD and TTD. The rankings our benchmarking protocol assigned to these drugs when using each mapping are plotted in Fig. 4C–E. Of these drugs, 208 had better ranks using CTD, 359 had better using TTD, and 9 had the same ranks either way. Drugs were also more frequently ranked in the top10, 25, and 100 cutoffs when using TTD. CTD generally outperformed TTD by a greater magnitude when it was better, but, on average, matched drugs were predicted 23.5 ranks better when using TTD. A P-value was calculated between the ranks given to withheld drugs in the paired associations.
We analyzed some specific drug-indication associations to further illustrate this difference. We first considered enzalutamide, an androgen receptor inhibitor approved for use in patients with prostate cancer (Astellas Pharma Inc 2025). CANDO ranked enzalutamide 44th for this indication using CTD, but its rank improved to 7th using TTD. Enzalutamide had the same consensus score across both mappings because it was among the top ten most similar compounds to the same drugs in both: bicalutamide, flutamide, and nilutamide. TTD included only 13 drugs in its prostate cancer indication, whereas CTD included 88 drugs. The greater number of associated drugs in the latter indication mapping thus led to increased consensus scores in general, making the score of enzalutamide less remarkable. The CTD prostate cancer indication included drugs like methotrexate, gemcitabine, and thalidomide that have shown efficacy for other cancers, but which are not approved for use in prostate cancer specifically (Smith et al. 1990, Sissung et al. 2009, Garcia et al. 2011, Lee et al. 2014, Bristol–Myers Squibb Company 2023, Hospira Inc 2023, Hospira Inc 2024). CTD also contains a related indication: “Prostatic neoplasms, castration-resistant.” Enzalutamide, which is approved for use against such neoplasms, ranked second for this indication during benchmarking based on its similarity with bicalutamide and apalutamide. Therefore, the greater data availability of data in CTD can also be beneficial due to the existence of a greater quantity of more specific indications that may not be available in TTD.
Apalutamide is another drug approved for use in castration-resistant or metastatic castration-sensitive prostate cancer (Janssen Pharmaceutical Companies 2024). It was associated with castration-resistant prostate cancer in CTD, but neither mapping associated apalutamide with the general prostate cancer indication. We used the primary prediction pipeline of CANDO to fill this gap: it ranked apalutamide 81st for prostate cancer in CTD and 14th in TTD, providing additional evidence of the superiority of TTD even on associations not included in the original data. However, this also illustrates the limitations of the mappings used. No data source is perfect; if a perfect drug-indication mapping existed, drug discovery pipelines would be unnecessary. Still, such avoidable errors impact performance: enzalutamide would have ranked higher in both mappings had apalutamide been included. In addition, errors decrease the reliability of benchmarking, as platforms are judged on the ability to predict fictitious drug-indication associations while real relationships are left unassessed. Updates and quality control of drug-indication data is thus necessary to ensure our platforms are built on reliable data and our assessments remain as accurate as possible.
In-depth and accurate benchmarking helps improve drug discovery platforms, ensures their reliability of published platforms, and maintains the quality of the field as a whole. Internal benchmarking has shown that CANDO consistently outperforms random chance, particularly when using a drug-indication mapping consisting of approved associations only. CANDO was able to predict up to 31% of existing drugs in the top 100 potential therapeutics for their indications when using TTD. We were also able to optimize multiple parameters and elucidate the relationships of multiple factors with performance using our new benchmarking protocol, and we found improved performance in ranking known drug-indication associations when using FDA-approved-only associations from TTD as input.
The findings of, protocols used in, and metrics calculated as part of this study also have applicability to the benchmarking of other drug discovery platforms. The protocols we used to compare CTD and TTD could be used by other platforms to explicitly test whether a larger quantity or quality of drug-indication associations benefits their platform more. A leave-one-out protocol is computationally expensive, but the comprehensiveness and minimal exclusion of training data resulting from testing each association individually can be beneficial for refining and assessing platforms where such expense is not prohibitive; a k-fold cross-validation strategy may be more appropriate in other cases. We also emphasize the importance of assessing the endpoint of a drug discovery pipeline, after any adjustments or consensus scoring protocols are applied, to ensure the most accurate picture of its performance is captured. Finally, we recommend using a combination of interpretable metrics, such as nAIA, recall, or precision at a given rank or probability cutoff, and metrics that weight highly-ranked positives over a long tail of better than random rankings, such as NDCG or Boltzmann-enhanced discrimination of the receiver operating characteristic (BEDROC) (Schuler et al. 2022).
Our current work is limited in that it is an internal benchmarking method, and it does not inherently allow comparison with other drug discovery platforms. Any such comparison would also be limited in quality due to differences in protocols, drug-indication mappings, and input data. Future work will therefore focus on creating frameworks and datasets that would allow for comparable, head-to-head benchmarking of drug discovery platforms, including CANDO. We also intend to utilize existing and future benchmarking tools to assist in the continued improvement of CANDO itself, including further optimization of our consensus scoring protocol through, for example, the weighting of similarity scores by protein evidence level, disease relevance, and/or gene expression profiles of indication-relevant cells.
Supplementary Material
Acknowledgements
The authors would like to acknowledge the Center for Computational Research of the University at Buffalo for their computational resources and support. We would also like to thank the members of the Samudrala Computational Biology Group.
Contributor Information
Melissa Van Norden, Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY 14203, United States.
William Mangione, Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY 14203, United States.
Zackary Falls, Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY 14203, United States.
Ram Samudrala, Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY 14203, United States.
Author contributions
Melissa Van Norden (Conceptualization [equal], Data curation [equal], Methodology [equal], Software [equal], Validation [equal], Visualization [lead], Writing—original draft [lead], Writing—review & editing [equal]), William Mangione (Data curation [equal], Methodology [supporting], Software [supporting], Writing—review & editing [supporting]), Zackary Falls (Conceptualization [equal], Data curation [equal], Funding acquisition [supporting], Methodology [equal], Project administration [equal], Software [equal], Supervision [equal], Writing—review & editing [equal]), and Ram Samudrala (Conceptualization [equal], Funding acquisition [lead], Methodology [equal], Project administration [equal], Supervision [equal], Visualization [supporting], Writing—review & editing [equal]).
Supplementary data
Supplementary data are available at Bioinformatics online.
Conflict of interest: None declared.
Funding
This work was supported by a National Institutes of Health Director’s Pioneer Award [DP1OD006779]; a National Center for Advancing Translational Sciences Clinical and Translational Sciences Award, ASPIRE Design Challenge Award, and ASPIRE Reduction-to-Practice Award [UL1TR001412]; a National Library of Medicine T15 Award and R25 Award [T15LM012495, R25LM014213]; a National Institute of Standards of Technology Award [60NANB22D168]; a National Institute on Drug Abuse Mentored Research Scientist Development Award [K01DA056690]; and startup funds from the Department of Biomedical Informatics at the University at Buffalo.
Data availability
CANDO is publicly available through Github at https://github.com/ram-compbio/CANDO. Supplementary data, drug-indication interaction matrices, and drug-indication mappings are available at http://compbio.buffalo.edu/data/mc_cando_benchmarking2. Data and code are also archived via Zenodo at https://doi.org/10.5281/zenodo.17241534.
References
- Astellas Pharma Inc. Xtandi (Enzalutamide) [Package Insert]. Drugs@FDA. 2025. https://www.accessdata.fda.gov/drugsatfda_docs/label/2025/203415s024,213674s012lbl.pdf (23 April 2025, date last accessed).
- Bayer M. Sqlalchemy. In: Brown A, Wilson G (eds) The Architecture of Open Source Applications Volume II: Structure, Scale, and a Few More Fearless Hacks. 2012. aosabook.org. [Google Scholar]
- Berman HM, Westbrook J, Feng Z et al. The protein data bank. Nucleic Acids Res 2000;28:235–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bristol–Myers Squibb Company. Thalomid (Thalidomide) [Package Insert]. Drugs@FDA. 2023. https://www.accessdata.fda.gov/drugsatfda_docs/label/2023/020785s071lbl.pdf (23 April 2025, date last accessed).
- Brown AS, Patel CJ. A review of validation strategies for computational drug repositioning. Brief Bioinform 2018;19:174–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bruggemann L, Falls Z, Mangione W et al. Multiscale analysis and validation of effective drug combinations targeting driver KRAS mutations in non-small cell lung cancer. Int J Mol Sci 2023;24:997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai L, Lu C, Xu J et al. Drug repositioning based on the heterogeneous information fusion graph convolutional network. Brief Bioinform 2021;22:bbab319. [DOI] [PubMed] [Google Scholar]
- Chatrikhi R, Feeney CF, Pulvino MJ et al. A synthetic small molecule stalls pre-mRNA splicing by promoting an early-stage U2AF2-RNA complex. Cell Chem Biol 2021;28:1145–57.e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng J, Xie Q, Kumar V et al. Evaluation of analytical methods for connectivity map data. In: Altman RB, Dunker AK, Hunter L, et al. (eds.), Biocomputing 2013. Singapore: World Scientific, 2013, 5–16. [PubMed]
- Cheng J, Yang L, Kumar V et al. Systematic evaluation of connectivity map for disease indications. Genome Med 2014;6:540–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chiang AP, Butte AJ. Systematic evaluation of drug–disease relationships to identify leads for novel drug uses. Clin Pharmacol Ther 2009;86:507–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chopra G, Kaushik S, Elkin PL et al. Combating Ebola with repurposed therapeutics using the CANDO platform. Molecules 2016;21:1537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chopra G, Samudrala R. Exploring polypharmacology in drug discovery and repurposing using the CANDO platform. Curr Pharm Des 2016;22:3109–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ciriaco F, Gambacorta N, Trisciuzzi D et al. PLATO: a predictive drug discovery web platform for efficient target fishing and bioactivity profiling of small molecules. Int J Mol Sci 2022;23:5245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Costin JM, Jenwitheesuk E, Lok S-M et al. Structural optimization and de novo design of dengue virus entry inhibitory peptides. PLoS Negl Trop Dis 2010;4:e721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crisan L, Istrate D, Bora A et al. Virtual screening and drug repurposing experiments to identify potential novel selective MAO-B inhibitors for Parkinson’s disease treatment. Mol Divers 2021;25:1775–94. [DOI] [PubMed] [Google Scholar]
- Davis AP, Wiegers TC, Johnson RJ et al. Comparative toxicogenomics database (CTD): update 2023. Nucleic Acids Res 2023;51:D1257–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DiMasi JA, Grabowski HG, Hansen RW. Innovation in the pharmaceutical industry: new estimates of R&D costs. J Health Econ 2016;47:20–33. [DOI] [PubMed] [Google Scholar]
- Fahimian G, Zahiri J, Arab SS et al. RepCOOL: computational drug repositioning via integrating heterogeneous biological networks. J Transl Med 2020;18:375–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Falls Z, Fine J, Chopra G, et al. Accurate prediction of inhibitor binding to HIV-1 protease using CANDOCK. Front Chem 2022;9:775513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Falls Z, Mangione W, Schuler J et al. Exploration of interaction scoring criteria in the CANDO platform. BMC Res Notes 2019;12:318–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fine J, Konc J, Samudrala R et al. CANDOCK: chemical atomic network–based hierarchical flexible docking algorithm using generalized statistical potentials. J Chem Inf Model 2020;60:1509–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fine J, Lackner R, Samudrala R et al. Computational chemoproteomics to understand the role of selected psychoactives in treating mental health indications. Sci Rep 2019;9:13155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fiscon G, Conte F, Farina L et al. SAveRUNNER: a network-based algorithm for drug repurposing and its application to covid-19. PLOS Comput Biol 2021;17:e1008686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galindez G, Matschinske J, Rose TD et al. Lessons from the COVID-19 pandemic for advancing computational drug repurposing strategies. Nat Comput Sci 2021;1:33–41. [DOI] [PubMed] [Google Scholar]
- Gao C-Q, Zhou Y-K, Xin X-H et al. DDA-SKF: predicting drug–disease associations using similarity kernel fusion. Front Pharmacol 2022;12:784171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garcia JA, Hutson TE, Shepard D et al. Gemcitabine and docetaxel in metastatic, castrate–resistant prostate cancer: results from a phase 2 trial. Cancer 2011;117:752–7. [DOI] [PubMed] [Google Scholar]
- Gottlieb A, Stein GY, Ruppin E et al. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol 2011;7:496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gu Y, Zheng S, Yin Q et al. REDDA: integrating multiple biological relations to heterogeneous graph neural network for drug–disease association prediction. Comput Biol Med 2022;150:106127. [DOI] [PubMed] [Google Scholar]
- Guala D, Sonnhammer EL. Network crosstalk as a basis for drug repurposing. Front Genet 2022;13:921286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Han X, Kong Q, Liu C et al. SubtypeDrug: a software package for prioritization of candidate cancer subtype–specific drugs. Bioinformatics 2021;37:2491–3. [DOI] [PubMed] [Google Scholar]
- Hospira Inc. Methotrexate Injection [Package Insert]. Drugs@FDA. 2023. https://www.accessdata.fda.gov/drugsatfda_docs/label/2024/011719Orig1s136lbl.pdf (23 April 2025, date last accessed).
- Hospira Inc. Gemcitabine Injection [Package Insert]. Drugs@FDA. 2024. https://www.accessdata.fda.gov/drugsatfda_docs/label/2024/200795s018lbl.pdf (23 April 2025, date last accessed).
- Hu G, Agarwal P. Human disease–drug network based on genomic expression profiles. PLoS One 2009;4:e6536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang W, Li Z, Kang Y et al. Drug repositioning based on the enhanced message passing and hypergraph convolutional networks. Biomolecules 2022;12:1666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang Y, Bin Y, Zeng P et al. NetPro: neighborhood interaction–based drug repositioning via label propagation. IEEE/ACM Trans Comput Biol Bioinform 2023;20:2159–69. [DOI] [PubMed] [Google Scholar]
- Huang Y, Dong D, Zhang W et al. DrugRepoBank: a comprehensive database and discovery platform for accelerating drug repositioning. Database 2024;2024:baae051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hudson ML, Samudrala R. Multiscale virtual screening optimization for shotgun drug repurposing using the CANDO platform. Molecules 2021;26:2581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ianevski A, Kushnir A, Nader K et al. Repurposedrugs: an interactive web-portal and predictive platform for repurposing mono-and combination therapies. Brief Bioinform 2024;25:bbae328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Janssen Pharmaceutical Companies. Erleada (Apalutamide) [Package Insert]. Drugs@FDA. 2024. https://www.accessdata.fda.gov/drugsatfda_docs/label/2024/210951s016lbl.pdf (23 April 2025, date last accessed).
- Jenwitheesuk E, Horst JA, Rivas KL et al. Novel paradigms for drug discovery: computational multitarget screening. Trends Pharmacol Sci 2008;29:62–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jenwitheesuk E, Samudrala R. Identification of potential multitarget antimalarial drugs. JAMA 2005;294:1490–1. [DOI] [PubMed] [Google Scholar]
- Jiang H-J, You Z-H, Huang Y-A. Predicting drug–disease associations via sigmoid kernel–based convolutional neural networks. J Transl Med 2019;17:382–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- John L, Soujanya Y, Mahanta HJ et al. Chemoinformatics and machine learning approaches for identifying antiviral compounds. Mol Inform 2022;41:2100190. [DOI] [PubMed] [Google Scholar]
- Kang H, Hou L, Gu Y et al. Drug–disease association prediction with literature based multi–feature fusion. Front Pharmacol 2023;14:1205144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Landrum G. Rdkit. 2010. https://www.rdkit.org/ (23 April 2025, date last accessed).
- Lee J-L, Ahn J-H, Choi MK et al. Gemcitabine–oxaliplatin plus prednisolone is active in patients with castration-resistant prostate cancer for whom docetaxel-based chemotherapy failed. Br J Cancer 2014;110:2472–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li G, Hilgenfeld R, Whitley R et al. Therapeutic strategies for COVID-19: progress and lessons learned. Nat Rev Drug Discov 2023;22:449–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li Z, Huang Q, Chen X et al. Identification of drug–disease associations using information of molecular structures and clinical symptoms via deep convolutional neural network. Front Chem 2020;7:924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liang X, Zhang P, Yan L et al. LRSSL: predict and interpret drug–disease associations based on data integration using sparse subspace learning. Bioinformatics 2017;33:1187–96. [DOI] [PubMed] [Google Scholar]
- Lin H-H, Zhang Q-R, Kong X et al. Machine learning prediction of antiviral–HPV protein interactions for anti–HPV pharmacotherapy. Sci Rep 2021;11:24367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin K, Li L, Dai Y et al. A comprehensive evaluation of connectivity methods for L1000 data. Brief Bioinform 2020;21:2194–205. [DOI] [PubMed] [Google Scholar]
- Lucchetta M, Pellegrini M. Drug repositioning by merging active subnetworks validated in cancer and COVID-19. Sci Rep 2021;11:19839. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luo H, Li M, Wang S et al. Computational drug repositioning using low–rank matrix approximation and randomized algorithms. Bioinformatics 2018;34:1904–12. [DOI] [PubMed] [Google Scholar]
- Luo H, Wang J, Li M et al. Drug repositioning based on comprehensive similarity measures and bi–random walk algorithm. Bioinformatics 2016;32:2664–71. [DOI] [PubMed] [Google Scholar]
- Madushanka A, Laird E, Clark C et al. SmartCADD: AI–QM empowered drug discovery platform with explainability. J Chem Inf Model 2024;64:6799–813. [DOI] [PubMed] [Google Scholar]
- Mammen MJ, Tu C, Morris MC et al. Proteomic network analysis of bronchoalveolar lavage fluid in ex–smokers to discover implicated protein targets and novel drug treatments for chronic obstructive pulmonary disease. Pharmaceuticals 2022;15:566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mangione W, Falls Z, Chopra G et al. Cando.py: open source software for predictive bioanalytics of large scale drug–protein–disease data. J Chem Inf Model 2020. a;60:4131–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mangione W, Falls Z, Melendy T et al. Shotgun drug repurposing biotechnology to tackle epidemics and pandemics. Drug Discov Today 2020. b;25:1126–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mangione W, Falls Z, Samudrala R. Optimal COVID-19 therapeutic candidate discovery using the CANDO platform. Front Pharmacol 2022;13:970494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mangione W, Falls Z, Samudrala R. Effective holistic characterization of small molecule effects using heterogeneous biological networks. Front Pharmacol 2023;14:1113007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mangione W, Samudrala R. Identifying protein features responsible for improved drug repurposing accuracies using the CANDO platform: implications for drug design. Molecules 2019;24:167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martinez V, Navarro C, Cano C et al. DrugNet: network–based drug–disease prioritization by integrating heterogeneous data. Artif Intell Med 2015;63:41–9. [DOI] [PubMed] [Google Scholar]
- Meng Y, Jin M, Tang X et al. Drug repositioning based on similarity constrained probabilistic matrix factorization: COVID-19 as a case study. Appl Soft Comput 2021;103:107135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meng Y, Lu C, Jin M et al. A weighted bilinear neural collaborative filtering approach for drug repositioning. Brief Bioinform 2022;23:bbab581. [DOI] [PubMed] [Google Scholar]
- Meng Y, Wang Y, Xu J et al. Drug repositioning based on weighted local information augmented graph neural network. Brief Bioinform 2024;25:bbad431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meyer P, Saez-Rodriguez J. Advances in systems biology modeling: 10 years of crowdsourcing dream challenges. Cell Syst 2021;12:636–53. [DOI] [PubMed] [Google Scholar]
- Michael S, Isern S, Garry R et al. Optimized dengue virus entry inhibitory peptide (1oan1). US Patent 8637472B2. 2014.
- Michael SF, Isern S, Garry R et al. Optimized dengue virus entry inhibitory peptide (dn81). US Patent 8,541,377. 2013.
- Minie M, Chopra G, Sethi G et al. CANDO and the infinite drug discovery frontier. Drug Discov Today 2014;19:1353–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moukheiber L, Mangione W, Moukheiber M et al. Identifying protein features and pathways responsible for toxicity using machine learning and Tox21: implications for predictive toxicology. Molecules 2022;27:3021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moult J, Pedersen JT, Judson R, et al. A large-scale experiment to assess protein structure prediction methods. Proteins 1995;23:ii–v. [DOI] [PubMed] [Google Scholar]
- Mullard A. New drugs cost us $2.6 billion to develop. Nat Rev Drug Discov 2014;13:877. [Google Scholar]
- Muratov EN, Amaro R, Andrade CH et al. A critical overview of computational approaches employed for COVID-19 drug discovery. Chem Soc Rev 2021;50:9121–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nicholson CO, Costin JM, Rowe DK et al. Viral entry inhibitors block dengue antibody–dependent enhancement in vitro. Antiviral Res 2011;89:71–4. [DOI] [PubMed] [Google Scholar]
- Overhoff B, Falls Z, Mangione W et al. A deep–learning proteomic–scale approach for drug design. Pharmaceuticals 2021;14:1277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Palanikumar L, Karpauskaite L, Al-Sayegh M et al. Protein mimetic amyloid inhibitor potently abrogates cancer–associated mutant p53 aggregation and restores tumor suppressor function. Nat Commun 2021;12:3962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park J-H, Cho Y-R. Computational drug repositioning with attention walking. Sci Rep 2024;14:10072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paul SM, Mytelka DS, Dunwiddie CT et al. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov 2010;9:203–14. [DOI] [PubMed] [Google Scholar]
- Pedregosa F, Varoquaux G, Gramfort A et al. Scikit-learn: machine learning in python. J Mach Learn Res 2011;12:2825–30. [Google Scholar]
- Peng L, Shen L, Xu J et al. Prioritizing antiviral drugs against SARS-CoV-2 by integrating viral complete genome sequences and drug chemical structures. Sci Rep 2021;11:6248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peters B, Brenner SE, Wang E et al. Putting benchmarks in their rightful place: the heart of computational biology. PLOS Comput Biol 2018;14:e1006494. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pushpakom S, Iorio F, Eyers PA et al. Drug repurposing: progress, challenges and recommendations. Nat Rev Drug Discov 2019;18:41–58. [DOI] [PubMed] [Google Scholar]
- Rifaioglu AS, Nalbat E, Atalay V et al. DEEPScreen: high performance drug–target interaction prediction with convolutional neural networks using 2-D structural compound representations. Chem Sci 2020;11:2531–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sadybekov AV, Katritch V. Computational approaches streamlining drug discovery. Nature 2023;616:673–85. [DOI] [PubMed] [Google Scholar]
- Schuler J, Falls Z, Mangione W et al. Evaluating the performance of drug–repurposing technologies. Drug Discov Today 2022;27:49–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schuler J, Samudrala R. Fingerprinting CANDO: increased accuracy with structure–and ligand–based shotgun drug repurposing. ACS Omega 2019;4:17393–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schulman A, Rousu J, Aittokallio T et al. Attention-based approach to predict drug–target interactions across seven target superfamilies. Bioinformatics 2024;40:btae496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sethi G, Chopra G, Samudrala R. Multiscale modelling of relationships between protein classes and drug behavior across all diseases using the CANDO platform. Mini Rev Med Chem 2015;15:705–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shaker B, Ahmad S, Lee J et al. In silico methods and tools for drug discovery. Comput Biol Med 2021;137:104851. [DOI] [PubMed] [Google Scholar]
- Shen C, Song J, Hsieh C-Y et al. DrugFlow: an AI-driven one-stop platform for innovative drug discovery. J Chem Inf Model 2024;64:5381–91. [DOI] [PubMed] [Google Scholar]
- Sissung TM, Thordardottir S, Gardner ER et al. Current status of thalidomide and CC–5013 in the treatment of metastatic prostate cancer. Anticancer Agents Med Chem 2009;9:1058–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith M, Lawson A, Kirk D et al. Low dose methotrexate and doxorubicin in hormone–resistant prostatic cancer. Brit J Urol 1990;65:513–6. [DOI] [PubMed] [Google Scholar]
- Su X, Hu L, You Z et al. A deep learning method for repurposing antiviral drugs against new viruses via multi-view nonnegative matrix factorization and its application to SARS-CoV-2. Brief Bioinform 2022;23:bbab526. [DOI] [PubMed] [Google Scholar]
- Su X, You Z, Wang L et al. SANE: a sequence combined attentive network embedding model for COVID-19 drug repositioning. Appl Soft Comput 2021;111:107831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun X, Wang B, Zhang J et al. Partner-specific drug repositioning approach based on graph convolutional network. IEEE J Biomed Health Inform 2022;26:5757–65. [DOI] [PubMed] [Google Scholar]
- Talele TT, Khedkar SA, Rigby AC. Successful applications of computer aided drug discovery: moving drugs from concept to the clinic. Curr Top Med Chem 2010;10:127–41. [DOI] [PubMed] [Google Scholar]
- Tang X, Cai L, Meng Y, et al. Indicator regularized non-negative matrix factorization method-based drug repurposing for COVID-19. Front Immunol 2021;11:603615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tanimoto TT. IBM internal report 17th Nov. Armonk, NY: IBM Corporation, 1957.
- Tanoli Z, Fernández-Torras A, Özcan UO et al. Computational drug repurposing: approaches, evaluation of in silico resources and case studies. Nat Rev Drug Discov 2025. a;24:521–42. pages [DOI] [PubMed] [Google Scholar]
- Tanoli Z, Schulman A, Aittokallio T. Validation guidelines for drug-target prediction methods. Expert Opin Drug Discov 2025. b;20:31–45. [DOI] [PubMed] [Google Scholar]
- Tanoli Z, Seemab U, Scherer A et al. Exploration of databases and methods supporting drug repurposing: a comprehensive survey. Brief Bioinform 2021. a;22:1656–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tanoli Z, Vähä-Koskela M, Aittokallio T. Artificial intelligence, machine learning, and drug repurposing in cancer. Expert Opin Drug Discov 2021. b;16:977–89. [DOI] [PubMed] [Google Scholar]
- Tayara H, Abdelbaky I, To Chong K. Recent omics-based computational methods for COVID-19 drug discovery and repurposing. Brief Bioinform 2021;22:bbab339. [DOI] [PubMed] [Google Scholar]
- Varsou D-D, Nikolakopoulos S, Tsoumanis A et al. Enalos suite: new cheminformatics platform for drug discovery and computational toxicology. Computat Toxicol 2018;:287–311. [DOI] [PubMed] [Google Scholar]
- Vink R, de Gooijer S, Beedie A et al. Python polars 0.20.8. Github. 2024. https://github.com/pola-rs/polars (25 July 2025, date last accessed).
- Virtanen P, Gommers R, Oliphant TE et al. ; SciPy 1.0 Contributors. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 2020;17:261–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vlietstra WJ, Vos R, Sijbers AM et al. Using predicate and provenance information from a knowledge graph for drug efficacy screening. J Biomed Semant 2018;9:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang W, Yang S, Li J. Drug target predictions based on heterogeneous graph inference. In: Altman RB, Dunker AK, Hunter L, et al., (eds.), Pacific Symposium on Biocomputing, Vol. 18. Singapore: World Scientific, 2013. a, 53–64. [PMC free article] [PubMed]
- Wang W, Yang S, Zhang X et al. Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics 2014;30:2923–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X, Xin B, Tan W et al. DeepR2cov: deep representation learning on heterogeneous drug networks to discover anti–inflammatory agents for COVID-19. Brief Bioinform 2021. a;22:bbab226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Y, Chen S, Deng N et al. Drug repositioning by kernel-based integration of molecular structure, molecular activity, and phenotype data. PLoS One 2013. b;8:e78518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Y, Deng G, Zeng N et al. Drug–disease association prediction based on neighborhood information aggregation in neural networks. IEEE Access 2019. a;7:50581–7. [Google Scholar]
- Wang Y, Yang Y, Chen S et al. DeepDRK: a deep learning framework for drug repurposing through kernel-based multi-omics integration. Brief Bioinform 2021. b;22:bbab048. [DOI] [PubMed] [Google Scholar]
- Wang Y-Y, Cui C, Qi L et al. DrPOCS: drug repositioning based on projection onto convex sets. IEEE/ACM Trans Comput Biol Bioinform 2019. b;16:154–62. [DOI] [PubMed] [Google Scholar]
- Wang Z, Zhou M, Arnold C. Toward heterogeneous information fusion: bipartite graph convolutional networks for in silico drug repurposing. Bioinformatics 2020;36:i525–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weber LM, Saelens W, Cannoodt R et al. Essential guidelines for computational method benchmarking. Genome Biol 2019;20:125–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wishart DS, Knox C, Guo AC et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 2006;34:D668–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wouters OJ, McKee M, Luyten J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 2020;323:844–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu J, Li J, He Y et al. DrugSim2DR: systematic prediction of drug functional similarities in the context of specific disease for drug repurposing. GigaScience 2023;12:giad104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu J, Li X, Wang Q et al. DRviaSPCN: a software package for drug repurposing in cancer via a subpathway crosstalk network. Bioinformatics 2022. a;38:4975–7. [DOI] [PubMed] [Google Scholar]
- Wu Y, Liu Q, Qiu Y et al. Deep learning prediction of chemical-induced dose-dependent and context-specific multiplex phenotype responses and its application to personalized Alzheimer’s disease drug repurposing. PLoS Comput Biol 2022. b;18:e1010367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Z, Ramsundar B, Feinberg EN et al. Moleculenet: a benchmark for molecular machine learning. Chem Sci 2018;9:513–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie G, Li J, Gu G et al. BGMSDDA: a bipartite graph diffusion algorithm with multiple similarity integration for drug–disease association prediction. Mol Omics 2021;17:997–1011. [DOI] [PubMed] [Google Scholar]
- Xu D, Zhang J, Roy A et al. Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based structure refinement. Proteins 2011;79 Suppl 10:147–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan C, Suo Z, Wang J et al. DACPGTN: drug atc code prediction method based on graph transformer network for drug discovery. Front Pharmacol 2022;13:907676. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang C, Zhang H, Chen M et al. A survey of optimal strategy for signature-based drug repositioning and an application to liver cancer. eLife 2022;11:e71880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J, Roy A, Zhang Y. Protein–ligand binding site recognition using complementary binding–specific substructure comparison and sequence profile alignment. Bioinformatics 2013;29:2588–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J, Yan R, Roy A et al. The I-TASSER suite: protein structure and function prediction. Nat Methods 2015;12:7–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang M, Luo H, Li Y et al. Drug repositioning based on bounded nuclear norm regularization. Bioinformatics 2019. a;35:i455–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang M, Wu G, Zhao Q et al. Computational drug repositioning based on multi–similarities bilinear matrix factorization. Brief Bioinform 2021;22:bbaa267. [DOI] [PubMed] [Google Scholar]
- Yang X, Zamit L, Liu Y et al. Additional neural matrix factorization model for computational drug repositioning. BMC Bioinformatics 2019. b;20:423–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Y, Chen L. Identification of drug–disease associations by using multiple drug and disease networks. CBIO 2022;17:48–59. [Google Scholar]
- Yi H-C, You Z-H, Wang L et al. In silico drug repositioning using deep learning and comprehensive similarity measures. BMC Bioinformatics 2021;22:293–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu L, Gao L. Human pathway-based disease network. IEEE/ACM Trans Comput Biol Bioinform 2019;16:1240–9. [DOI] [PubMed] [Google Scholar]
- Yu L, Zhao J, Gao L. Predicting potential drugs for breast cancer based on miRNA and tissue specificity. Int J Biol Sci 2018;14:971–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu Z, Huang F, Zhao X et al. Predicting drug–disease associations through layer attention graph convolutional network. Brief Bioinform 2021;22:bbaa243. [DOI] [PubMed] [Google Scholar]
- Zeng X, Zhu S, Liu X et al. deepDR: a network-based deep learning approach to in silico drug repositioning. Bioinformatics 2019;35:5191–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang M-L, Zhao B-W, Su X-R et al. RLFDDA: a meta-path based graph representation learning model for drug–disease association prediction. BMC Bioinformatics 2022. a;23:516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang S-D, Gant TW. A simple and robust method for connecting small-molecule drugs using gene-expression signatures. BMC Bioinformatics 2008;9:258–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang W, Xu H, Li X et al. DRIMC: an improved drug repositioning approach using Bayesian inductive matrix completion. Bioinformatics 2020;36:2839–47. [DOI] [PubMed] [Google Scholar]
- Zhang W, Yue X, Huang F et al. Predicting drug–disease associations and their therapeutic function based on the drug–disease association bipartite network. Methods 2018. a;145:51–9. [DOI] [PubMed] [Google Scholar]
- Zhang W, Yue X, Lin W et al. Predicting drug–disease associations by using similarity constrained matrix factorization. BMC Bioinformatics 2018. b;19:233–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 2008;9:40–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Lei X, Pan Y et al. Drug repositioning with GraphSAGE and clustering constraints based on drug and disease networks. Front Pharmacol 2022. b;13:872785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Luo M, Wu P et al. Application of computational biology and artificial intelligence in drug design. Int J Mol Sci 2022. c;23:13568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao B-W, Hu L, You Z-H et al. HINGRL: predicting drug–disease associations with graph representation learning on heterogeneous information networks. Brief Bioinform 2022. a;23:bbab515. [DOI] [PubMed] [Google Scholar]
- Zhao B-W, Su X-R, Hu P-W et al. A geometric deep learning framework for drug repositioning over heterogeneous information networks. Brief Bioinform 2022. b;23:bbac384. [DOI] [PubMed] [Google Scholar]
- Zhao B-W, Wang L, Hu P-W et al. Fusing higher and lower-order biological information for drug repositioning via graph representation learning. IEEE Trans Emerg Topics Comput 2024;12:163–76. [Google Scholar]
- Zheng Y, Wu Z. A machine learning-based biological drug–target interaction prediction method for a tripartite heterogeneous network. ACS Omega 2021;6:3037–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou L, Wang J, Liu G et al. Probing antiviral drugs against SARS-CoV-2 through virus–drug association prediction based on the KATZ method. Genomics 2020;112:4427–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou Y, Zhang Y, Zhao D et al. TTD: therapeutic target database describing target druggability information. Nucleic Acids Res 2024;52:D1465–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu H. Big data and artificial intelligence modeling for drug discovery. Annu Rev Pharmacol Toxicol 2020;60:573–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu Y, Brettin T, Evrard YA et al. Ensemble transfer learning for the prediction of anti-cancer drug response. Sci Rep 2020;10:18040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zong N, Chowdhury S, Zhou S et al. Advancing efficacy prediction for electronic health records based emulated trials in repurposing heart failure therapies. NPJ Digit Med 2025;8:306. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
CANDO is publicly available through Github at https://github.com/ram-compbio/CANDO. Supplementary data, drug-indication interaction matrices, and drug-indication mappings are available at http://compbio.buffalo.edu/data/mc_cando_benchmarking2. Data and code are also archived via Zenodo at https://doi.org/10.5281/zenodo.17241534.




