Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2024 Jan 11;2023:559–568.

Validation approaches for computational drug repurposing: a review

Malvika Pillai 1,2, Di Wu 2
PMCID: PMC10785886  PMID: 38222367

Introduction

Drug discovery and development is a cost and time burdensome process that has stagnated the entry of new drugs into the market. The traditional process for drug development can take approximately 12 to 16 years and cost approximately $1 to $2 billion [1]. Pre-clinical research consists of laboratory and animal testing of a drug compound. Consequently, Phases I through III clinical trials determine drug safety, efficacy, and therapeutic effect, respectively. Then, in America, the drug candidate is pushed for Food and Drug Administration (FDA) review. Due to the high cost and time burden of the traditional process of drug development, finding whether an existing drug can be re-positioned (i.e., repurposed) for treatment of a different disease that this drug hasn’t been indicated to in the drug label is an alternative, more cost-effective option to address many of the barriers to getting a drug to the market.

Drug repurposing is defined as the process of applying known drugs/compounds that are already on the market to new disease indications. Repurposed drugs can be exempt from the prior phases leading to Phases II and III clinical trials and FDA approval process, reducing time and cost. For example, a liberal estimate for cost and number of years required to repurpose a drug is approximately $300 million for approximately 6 years1. The risk of failure is lower for repurposed drugs because candidates for late-stage repurposing have already been proven safe through preclinical models and in humans [2]. Based on prior preclinical testing, drug repurposing shortens the processing time and reduces the cost to find a drug for the different disease, that will positively affect downstream effects on population health outcomes, at the patient level.

Due to the serendipitous nature of early discoveries, there has been a push toward data-driven repurposed drug development, a method that allows for more consistent hypothesis generation, that also responds to the recent availability of large-scale biomedical datasets (e.g., risk single nucleic polymorphisms (SNPs) identified in genome wide association studies (GWAS) and protein interaction databases) and clinical datasets (e.g., electronic medical records (EMRs)). Computational drug repurposing consists of using computational approaches for systematic data analysis that can lead to forming drug repurposing hypotheses. The rigorous drug repurposing pipeline mainly involves making connections between two components, the existing drugs and the diseases that need drug treatments. The connection is built based on the features collected via biological experiments or clinically that can represent or describe these two components through computational tools, particularly when the feature datasets are large and high dimensional. After hypothesis generation, it also involves later steps of validation (Figure 1).

Figure 1.

Figure 1.

Drug repurposing workflow

The push toward data-driven drug repurposing has led to an increase in computational drug repurposing efforts. Conservative drug development consists of ‘one drug, one target’ research that does not evaluate off-target effects or multiple drug indications [3]. Computational approaches are intended to build direct or indirect connections between known drugs and diseases at a high-throughput scale in an automated way. We define the following main steps for a more complete drug repurposing pipeline for a disease. First, in the prediction step, people use the drug-disease connection to predict repurposed drug candidates computationally, producing the predicted repurposed drug candidate. Second, in the validation step, to reduce false positives, people use independent information that has not been used in the prediction step such as previous experimental/clinical studies, or independent resources/aspects of data (e.g., protein interaction data and gene expression data) about the drug-disease connection. If further supporting evidence is provided in this step, it builds better confidence of repurposed drugs, producing a validated repurposed drug candidate. In addition, false positive candidates may be removed from the repurposing list.

In this paper, we conduct a systematized review for computational drug repurposing and the follow-up validation strategies. There have been many publications about how to predict repurposed drugs and validate them as drug repurposing using experiments, available databases, and computational methods. We included all studies using both computational and experimental validation.

A previous review by Brown et al [4] described three validation strategies for computational drug repurposing with the conclusion that there is widespread variation in how authors to validate the predicted repurposed drug candidates. They structured the query to search for studies with two keywords: drug repurposing and validation. After collecting all studies on drug repurposing that mention variations of the term “validation”, they excluded all non-computational studies. They defined analytic validation as the comparison of computational results to existing biomedical knowledge using analytical metrics such as sensitivity and specificity. Their study suggested that analytic validation is a more rigorous form of validation; however, their review was conducted with a limited study pool because of the search query terms selected. The purpose of our review is to identify how researchers present any supporting evidence for their computational drug repurposing predictions for validation, while Brown et al [4]’s approach captured types of analytic validation. They focused on computational validation only, while our review aims to extend their review to include studies that have both computational and non-computational types of validation.

This review aims to answer the research question: how do researchers provide validation for drug repurposing candidate predictions from computational methods? In this review, we examine types of validation for drug candidates in computational drug repurposing studies. We compare validation approaches and describe the trade-offs of using each approach. We propose that studies with validation are those which provide evidence to push a drug candidate along the drug development pipeline and provide recommendations for researchers deciding on how to validate drug repurposing candidates.

Methods

Search strategy

The methodology presented in the PRISMA Statement for systematic reviews [5] was used to create the search strategy. A comprehensive search was conducted across three databases: PubMed, Web of Science, and ACM Digital Library for all relevant articles pertaining to computational methods for drug repurposing. Both peer-reviewed journal articles and conference proceedings were included in the review. The search was conducted on September 12, 2019 with the query: (drug repurpos* OR drug reposition*) AND (computational OR computation OR computations OR algorithm OR algorithms OR network OR networks OR machine learning OR deep learning OR prediction OR predictions). The query resulted in 3086 studies total with 996 from PubMed, 1144 from Web of Science, and 946 from ACM Digital Library. The PRISMA flow diagram is presented in Figure 2.

Figure 2.

Figure 2.

PRISMA Flow Diagram

Inclusion and exclusion criteria

The inclusion criteria for this review were: (1) the paper focused on drug repurposing candidate prediction and (2) the paper used a computational method for prediction. A study was excluded from the review if it: (1) did not include both computational and experimental validation of predictions, (2) did not relate to drug repurposing, (3) was a non-computational paper, (4) was not an independent study (i.e., a review or perspective), (5) was not a full paper (i.e., an abstract for a poster), (6) was a duplicate paper, and (7) was not research for humans.

Study evaluation and data extraction

Covidence software was used for article screening [6]. Extracted data included: number of citations, whether the paper was condition-specific, computational method used, and validation method used. Quality assessment was conducted with a citation analysis.

Results

The search across Pubmed, Web of Science, and ACM Digital Library identified 3086 articles. After filtering out duplicates, 2386 articles were included in the screening process. In abstract screening, 1654 studies were excluded for either not being related to drug repurposing candidate prediction, not using a computational method, not being research for humans, or not being an independent study (i.e., a review or perspective). 732 studies were assessed for full-text eligibility. In full-text screening, 603 papers did not contain either computational or experimental validation methods, 43 were not about drug repurposing candidate prediction, 22 were non-computational, 16 were review papers, 15 were not full papers, 6 were duplicate papers, and 2 were not research for humans (Figure 2).

For studies to push drug repurposing candidates forward in the drug discovery process, a drug candidate requires validation (i.e., supporting evidence). Two kinds of validation will be discussed: computational validation and non-computational validation. Computational validation methods found consist of retrospective clinical analysis, literature support, public database search, testing with benchmark datasets, and online resource search. Non-computational validation methods found consist of in vitro, in vivo, or ex vivo experiments, drug repurposing clinical trials, and expert review of predictions. Many studies use multiple forms of validation. Studies using both computational and non-computational validation are described in detail (See C. Both Computational and non-computational validation).

A. Computational validation

266 studies only contained computational validation.

A1. Retrospective clinical analysis

Validation with retrospective clinical analysis can be divided into two categories: studies using EHR or insurance claims to validate drug repurposing candidates and studies searching for existing clinical trials. Both forms of validation are used on their own and in combination with other forms of validation. Studies that search for existing clinical trials to validate drug candidate predictions generally use the clinical trials database (clinicaltrials.gov) to find trials that are testing the potential of predictions made within the studies. Evaluation datasets can also be compiled from the database to test performance of a drug repurposing system on a larger scale. Having existing clinical trials as support is vital information about a drug candidate because that indicates that the drug has already passed through hurdles in the drug discovery process [7].

There is no clear weakness in this approach; however, knowing the phases (I-III) of clinical trials is important to evaluate how much validation is provided. Passing Phase 1 clinical trials compared to passing Phases 1 and 2 clinical trials has different clinical and regulatory implications. While some studies differentiated by clinical trial phase [8], others extracted drug-disease connection from clinical trials into datasets without specifying the clinical trial phases [9]. EHR or insurance claims data, as a part of retrospective clinical analysis, have traditionally been used to examine off-label usage of drugs and finding off-label usage is another strong form of validation because it provides evidence that a drug has efficacy in humans for a given indication [10, 11, 12]. However, there are privacy and data accessibility issues when considering using clinical records for validation, unlike many of the publicly available validation methods described in the review.

A2. Literature support

166 studies solely used literature support and over half of the studies in the review mention using literature to support drug candidate predictions in conjunction with other validation methods. There has been tremendous growth in the amount of biomedical literature published with PubMed alone comprising of over 30 million citations, which allows for different kinds of methods to extract information. The types of literature support are grouped into three categories: literature search, survey, and mining. Literature search validation uses a tool like PubMed to manually find relevant articles containing connections between old drugs and new uses. If there are no methods described for extracting literature and only a citation available, it is assumed that the authors used a literature search. Methods of prediction in studies that use literature for validation range across gene expression analysis, network or matrix manipulation, machine and deep learning, structure-based modeling or screening, and text or data mining models. The extent of literature support provided varies across studies, irrespective of the method used for prediction.

Literature search is the most prevalent method of validation found in the review. The strength and weakness of this approach come from the studies selected as previous evidence. For example, Grenier et al [13] found existing clinical trials to support four of six predictions, while literature was found validating the predictions with human cell lines and animal models. The literature evidence was described in detail, making this strong literature support because the drug-disease association mentioned in literature had been directly tested. However, if previous literature did not contain experimental evidence, compared to an in-depth description with case studies, providing citations for prediction without explanation or providing a low-quality citation can be considered weaker validation.

Literature survey validation is defined as a person verifying a set of literature search results as true connections, which is more in-depth than a literature search. Two studies only used a literature survey to validate predictions (Table 2). A literature survey can be time consuming, but it is the most thorough literature support described in this review [14, 15]. Surveying literature consists of experts reading studies and deciding if the literature can be considered validation for predictions. The difference between a literature search and a literature survey is the expert opinion included, which ensures the quality of the supporting evidence provided. For example, Tan et al [15] had three experts read through the literature and include a study as validation if most experts agreed. Using expert opinion increases the confidence in the drug repurposing candidate and its validation.

Literature mining validation uses computational algorithms to analyze literature and verify connections. 9 studies mentioned using literature mining to validate predictions. Literature mining is the quickest approach to investigate previous evidence; however, all literature mining validation methods in the review used co-occurrence to illustrate the extent of evidence for a drug-disease co-mention [16, 17]. Using term-occurrence for literature mining only provides basic information on whether a drug and disease have been mentioned together. Examining drug-disease co-occurrence does not provide information on drug efficacy, but it can demonstrate that the pair has been studied previously.

A3. Public database search

8 studies only used public databases to validate predictions (Table 2). Public databases can be data sources for predictive models, but after model training and testing, researchers can search public databases for existing drug indications, making them useful sources of supporting evidence for drug repurposing predictions. They are useful for both drug-disease and drug-target interaction (DTI) prediction, and many of the databases used for validation comprise the benchmark datasets discussed in A4. Benchmark dataset evaluation. In this review, searching the clinical trials database is considered different from searching for drug indications in other databases (See A1. Retrospective clinical analysis).

Public database search is a form of validation that is frequently used in combination with other methods of validation [18, 19] (Table 2). The strength of using public databases comes from the type of information provided in the database and the frequency at which the database is updated. The three most used databases, DrugBank [20], KEGG [21], and CTD [22], vary greatly in terms of domain but are all manually curated from various sources. The manual curation differentiates public database search from literature search and builds trust in the quality of associations described in the databases. The three databases are also updated regularly. While the databases are well reputed and referenced in the scientific community, the value of this validation approach also depends on how supporting evidence is extracted and examined within studies. For example, Peng et al [23] used case studies to describe evidence found in various databases and clearly explained how the predicted drugs and targets could interact, providing useful validation.

A4. Benchmark dataset evaluation

15 studies only used benchmark datasets to validate predictions (Table 2). Benchmark dataset evaluation is defined as when a study uses an independent dataset separate from the data used in training the predictive model to evaluate drug repurposing predictions. While public database search consists of researchers searching for indications of drugs individually, benchmark datasets are compiled from public databases and processed to allow for reproducible evaluation. When testing with benchmark datasets, the evaluation metrics used are especially important. Across the studies using benchmark datasets for validation, area under the receiver operating characteristic (ROC) curve (AUC or AUROC) is the most common metric used, where higher AUC indicates better performance. The ROC curve is created by plotting true positive rate against false positive rate. Other commonly used metrics include precision and area under the precision-recall curve (AUPR). Also, the types of benchmark datasets vary depending on the prediction task. For example, many DTI prediction and drug-disease association prediction studies use benchmark datasets to evaluate performance, but the type of dataset and extent of validation used for both tasks differ.

Benchmark dataset evaluation is primarily used in drug repurposing studies using network analysis or machine learning methods for prediction. Validating with benchmark datasets like the Gottlieb et al [24] or Yamanishi et al [25] datasets is useful for comparing performance across prediction methods. A limitation of benchmark datasets is that they require updates to account for additional knowledge since publication, and some studies have overcome this limitation by using them for training rather than as validation. For example, Keum et al [26] trained a model on the Yamanishi et al [25] datasets and tested on a dataset with updated DTI’s from the most recent versions of databases. Studies also included validation using benchmark datasets comprised of drug-disease pairs from clinical trials.

A5. Online resource support

Online resource search is defined as using websites with drug or condition information that are generally directed at consumers for validation. It is the weakest form of validation in this review and is only used in combination with other validation methods [27]. Commonly used websites include drugs.com and webMD.com. The information compiled on websites such as drugs.com may have reputed sources; however, without the source being described in a study, it is not possible to understand whether the validation presented is substantiated or not. Therefore, only stating that a prediction was mentioned on a website is not thorough validation, and further discussion is necessary.

B. Non-computational validation

123 studies only contained non-computational validation. Non-computational validation consists of expert review of predictions [28], experimental support [29], and drug repurposing clinical trials [30]. A drug repurposing clinical trial is defined as a clinical trial that resulted from a drug repurposing effort.

119 studies only used an in vitro, in vivo, or ex vivo experiment to validate predictions (Table 2). Experimental validation for drug candidates is crucial in the preclinical development stage of drug development. Therefore, computational drug repurposing studies that validate candidates experimentally satisfy criteria for early-stage repurposing, making this approach strong. With satisfactory experimental validation, there is enough supporting evidence to pursue a drug repurposing clinical trial. The weakness in this approach is based on the effort required to complete the experiments. In comparison to searching a public database or finding another study with evidence in literature, conducting in vitro assays or examining drug performance in an in vivo model is much more time and cost intensive. Methods used for prediction in the studies using experimental validation in this review were network analysis, gene expression analysis, molecular docking, machine learning, and similarity-based approaches.

Two studies performed alternate forms of non-computational validation. Grammer et al [30] performed a clinical trial to validate a drug candidate prediction and Bakal et al [28] conducted an expert review of predictions. Of all the approaches used, expert review of predictions can be considered the weakest because it is based on experts describing what could be used in clinical practice. As it is based on human expertise, experiments need to be conducted to verify hypotheses. A drug repurposing clinical trial is a strong from of validation and stronger than experimental support. It satisfies the Phase I clinical trial requirement to determine the safety of a drug candidate in humans.

C. Both computational and non-computational validation

27 studies include both computational and in vitro, in vivo, or ex vivo experimental validation. The goal of using combinations of validation is to provide multi-faceted support for drug candidates to push drug repurposing candidates through the drug development process and inform clinical trials. Including experimental support satisfies the preclinical development stage in the drug development process.

Cheng et al [29] used a network analysis approach by constructing a protein-protein interactome for drug-cardiovascular outcome association prediction. Propensity score matching and sensitivity analyses were conducted to validate four associations using the Truven MarketScan and Optum Clinformatics databases, and half of the drugs, hydroxychloroquine and carbamazepine, were found to decrease and increase the risk of coronary artery disease respectively. An in vitro assay using human aortic endothelial cells was used to validate the connection between hydroxychloroquine and decreased CAD risk. Chen et al [31] used virtual screening of two databases, CMap and LINCS, to predict drug candidates for hepatocellular carcinoma (HCC), and used existing clinical trials, in vitro assays, and an in vivo mouse model to validate predictions. Unlike other studies that searched for clinical trials with predicted drug-disease associations, a list of drugs from the clinical trials database was used to rank predictions and provide confidence in the approach. However, the candidate with the highest score had not been tested in preclinical models yet, so the study validated the connection between niclosamide ethanolamine and HCC using in vitro assays and an in vivo mouse model.

Prediction Methods Used with Validation Types

In each drug repurposing candidate prediction study in this review, there are two methods used: a prediction method and a validation method. All the studies were divided into categories based on validation type. Within each category, the most used prediction methods were examined. Given a validation type like retrospective clinical analysis, the aim of the comparison is to understand which prediction methods can be validated using this validation method. Some examples of input data for prediction are also used by different studies for their validation as described in this review. Table 4 shows the distribution of validation methods used and the most common prediction methods used with each validation subtype. The most used prediction methods across all validation subtypes were network analysis (e.g., non-neural networks like protein-protein interaction network), gene expression analysis (e.g., connectivity mapping), traditional machine or deep learning, and molecular docking. Another less popular prediction method, molecular docking is an overarching category that includes virtual screening, pharmacophore modeling, and molecular docking.

Table 4.

Prediction methods used with validation subtypes. *For validation subtypes associated with more than 3 studies, the prediction methods are ranked in terms of frequency of use. For validation subtypes where only one prediction method was used across all associated studies, a ranking was not included.

Validation Type Validation Subtype Count of Studies Using Validation Subtype (N=416) Top Prediction Methods Used With Validation Subtype
Computational Retrospective clinical analysis 65 (15.6%)
  1. 1. * Network analysis

  2. 2. Gene expression analysis

  3. 3. Machine or deep learning

Literature support 233 (56.0%)
  1. 1. Network analysis

  2. 2. Gene expression analysis

  3. 3. Structural modeling

Benchmark dataset evaluation 28 (6.7%)
  1. 1. Network analysis

  2. 2. Structural modeling

  3. 3. Machine learning

Public database search 40 (9.6%)
  1. 1. Network analysis

  2. 2. Machine or deep learning

  3. 3. Matrix factorization

Online resource search 5 (1.2%) • Network analysis
Non-computational Experimental support 146 (35.1%)
  1. 1. Structural modeling

  2. 2. Network analysis

  3. 3. Gene expression analysis

Drug repurposing clinical trial 1 (0.2%) • Gene expression analysis
Expert review of predictions 1 (0.2%) • Machine learning

Network analysis was the most frequently used overall with 55% (229) of studies using the method to predict drug repurposing candidates. Gene expression analysis was used for prediction mostly in studies using retrospective clinical analysis, literature support, or experimental support for validation, while in other validation categories, it was not among the top three methods used. 12% (8) of studies (65) using retrospective analysis, 11% (26) of studies (233) using literature support, and 25% (36) of studies (146) using experimental support applied gene expression analysis in prediction. Machine or deep learning methods were used for prediction in studies using the following validation types: 13% (10) of studies (65) using retrospective clinical analysis, 9% (22) of studies (233) using literature support, 11% (3) of studies (28) using benchmark dataset evaluation, 20% (8) of studies (40) using public database search, and 5% (8) of studies (146) using experimental support. While prediction methods were similar across validation types, a few were different from the majority. 32% (47) of studies (146) using experimental support for validation used molecular docking for drug repurposing candidate prediction, which is in comparison to 14% (4) of studies (28) using benchmark datasets, 11% (26) of studies (233) using literature support, and 2% (1) of studies (65) using retrospective clinical analysis as validation. The third most common drug repurposing candidate prediction method (6%, 3 studies) associated with studies using public database search for validation was matrix factorization, which is in comparison to 7% (2) of studies (28) using benchmark datasets, 2% (5) of studies (233) using literature support, and 2% (1) of studies (65) using retrospective clinical analysis as validation.

Discussion

This review examined how researchers define and provide validation for computational drug repurposing candidates. Computational drug repurposing provides a systematic method for connecting approved drugs with new indications, reducing the amount of time and cost of drug development. 628 studies using computational approaches for drug repurposing were identified in this review, showing the vast amount of research that has been conducted in this area. However, predicting drug candidates without providing independent support does not provide enough evidence for a drug to be further pursued. 416 of 628 studies, which is roughly two thirds of the studies, contained validation for predictions. Validation is needed to demonstrate the significance of the prediction, and the review has shown that the number of studies including validation have increased over time (Figure 3). There are various levels of validation though, and not all types will allow for drug candidates to progress faster in the drug development process.

Figure 3.

Figure 3.

Number of Studies Including Validation Over Time.

Nine types of validation are described in the review, and each has its strengths and weaknesses. They also overlap across studies in that some are used as input data for prediction and used by different studies as validation (Table 2). Computational validation methods consist of retrospective clinical analysis, literature support, public database search, testing with benchmark datasets, and online resources. The computational methods are ranked in order based on their strength of validation from validation methods that provide enough support for a drug to continue to clinical trials to methods that do not provide enough support to move a drug through the drug development process. Non-computational validation methods are in vitro, in vivo, or ex vivo experiments, drug repurposing clinical trials, personalized patient treatment, and expert review of predictions. Many studies use multiple forms of validation. Studies using both computational and non-computational validation are described in detail (See C. Both Computational and non-computational validation). For each validation type, the number of studies using it as a single source of validation and the number of studies using it in combination with other forms of validation are displayed in Table 2. The strengths and weaknesses are based on how much we perceive that the form of validation will allow for a drug to be pushed to market faster. Computational and non-computational validation are difficult to compare because non-computational validation is a required component of the drug development process unless it has already been conducted in past studies.

This review focuses on providing a broad overview of methods researchers use to validate drug repurposing candidates by categorizing them and describing their uses. The validation methods used were analyzed within their categories and all nine categories were compared to each other. There were five types of computational validation found and three types of non-computational validation found. There are varying definitions of validation in drug repurposing, and this is the first review to search for computational drug repurposing studies and consequently explore how researchers provide independent supporting evidence as validation without prediction method-focused exclusion criteria. In addition, this review examines the most common validation types used in combination with different prediction methods, providing guidance for researchers who wish to select a validation method based on their chosen prediction method and the trade-off between strength and time/cost needed for each validation method.

Recommendations

Given a computational drug repurposing study, there are various types of validation to choose from. Strength of validation is determined in this review based on how close a drug candidate is to regulatory approval in the traditional drug development process after completing a given form of validation. In terms of strength, non-computational approaches like experimental validation and clinical trials are far stronger than any computational approaches for validation. The non-computational approaches can be considered “true” validation, but they are more time and cost intensive. Computational approaches can be ranked as follows: retrospective clinical analysis, literature survey, literature search, literature mining, public database support, benchmark dataset evaluation, and online resource support. Retrospective clinical analysis is the strongest form of computational validation as it consists of off-label usage and clinical trial support. Off-label usage can be used to demonstrate a drug candidate’s effect on humans; however, aside from identifying off-label usage, analysis must be conducted to identify whether the drug effect was positive or negative on a given condition. Clinical trial support for a drug can indicate drug safety and efficacy, depending on the trial stage. Searching for clinical trials is a more straightforward approach than identifying off-label usage as clinical trials are more systematic and have clearly defined results for a drug’s effect on a given condition. As both off-label usage and clinical trials can be used to demonstrate a drug candidate’s effect on humans, the evidence can be considered stronger than validation with animal models, leading to late-stage repurposing. Based on review findings, computational evidence should be explored prior to conducting non-computational validation.

Limitations

This review explores how studies provide validation for computational drug repurposing, and a key limitation is the diversity in how researchers interpret validation. For example, within literature search support, studies can provide detailed case studies or provide citations as validation. Both are considered validation although one is more thorough than the other. In addition, some validation methods are also used for prediction. For example, retrospective clinical analysis is a commonly used method of prediction [10, 11, 12], but studies have also used it for hypothesis validation. Although the overlap between prediction and validation methods can be a limitation, this is mitigated as long as the prediction method and validation method are not the same in one study.

Drug repurposing efforts boomed during the COVID-19 pandemic boomed with a dramatic increase in studies related to drug repurposing published between 2020 and 2021. Along with an increase in publications, drug repurposing also became a household term as a beacon of hope in a time of uncertainty as well as a cautionary tale showing the importance of ensuring that drug repurposing candidates are well-validated before exposing patients to them. Although this review does not include peri-pandemic literature, it provides a comprehensive overview of ways that researchers can validate their predictions with varying levels of support.

Conclusions

Validation for computational drug repurposing provides confidence in predicted drug repurposing candidates, the extent of which varies based on the form of validation used. Studies using computational and non-computational validation approaches are described in this review. All non-computational validation methods can be summarized as expert opinion, animal testing, and clinical testing. Animal and clinical testing are undertaken in the traditional drug development process and are still required for the repurposed drug development process if there is no evidence of them already being completed. All computational validation methods can be summarized as either finding overlaps between predicted associations and an accepted form of evidence or using analytical metrics to evaluate model performance. Finding overlaps with predicted associations and evidence like literature or public databases with case studies can provide confidence by satisfying parts of the drug development process for early or late-stage repurposing. Literature support and public databases are used regardless of prediction method. Using analytical metrics can provide confidence in the predictions through statistical significance and inform further non-computational validation for early-stage repurposing. As the main goal of drug repurposing is to reduce the amount of time and money that goes into pushing a drug to market, the main goal of validation is to shorten that process even further.

Acknowledgements

This work is supported by the National Institutes of Health (NIH) National Library of Medicine (NLM) Training grant (T15-LM012500).

Figures & Tables

References

  • 1.Nosengo N. Can you teach old drugs new tricks? Nature. 2016;534(7607):314–6. doi: 10.1038/534314a. [DOI] [PubMed] [Google Scholar]
  • 2.Pushpakom S, Iorio F, Eyers PA, Escott KJ, Hopper S, Wells A, et al. Drug repurposing: progress, challenges and recommendations. Nat Rev Drug Discov. 2019;18(1):41–58. doi: 10.1038/nrd.2018.168. [DOI] [PubMed] [Google Scholar]
  • 3.Sun P, Guo J, Winnenburg R, Baumbach J. Drug repurposing by integrated literature mining and drug-gene-disease triangulation. Drug Discov Today. 2017;22(4):615–9. doi: 10.1016/j.drudis.2016.10.008. [DOI] [PubMed] [Google Scholar]
  • 4.Brown AS, Patel CJ. A review of validation strategies for computational drug repositioning. Brief Bioinformatics. 2018;19(1):174–7. doi: 10.1093/bib/bbw110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group* t. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Annals of internal medicine. 2009;151(4):264–9. doi: 10.7326/0003-4819-151-4-200908180-00135. [DOI] [PubMed] [Google Scholar]
  • 6.Innovation VH. Covidence systematic review software. Melbourne, Australia. 2019.
  • 7.Baker NC, Ekins S, Williams AJ, Tropsha A. A bibliometric review of drug repurposing. Drug Discov Today. 2018;23(3):661–72. doi: 10.1016/j.drudis.2018.01.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Yeung T-L, Sheng J, Leung CS, Li F, Kim J, Ho SY, et al. Systematic Identification of Druggable Epithelial-Stromal Crosstalk Signaling Networks in Ovarian Cancer. J Natl Cancer Inst. 2019;111(3):272–82. doi: 10.1093/jnci/djy097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wang Q, Xu R. Drug repositioning for prostate cancer: using a data-driven approach to gain new insights. AMIA Annu Symp Proc. 2017;2017:1724–33. [PMC free article] [PubMed] [Google Scholar]
  • 10.Brown AS, Rasooly D, Patel CJ. Leveraging Population-Based Clinical Quantitative Phenotyping for Drug Repositioning. CPT Pharmacometrics Syst Pharmacol. 2018;7(2):124–9. doi: 10.1002/psp4.12258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Gottlieb A, Altman RB. Integrating systems biology sources illuminates drug action. Clin Pharmacol Ther. 2014;95(6):663–9. doi: 10.1038/clpt.2014.51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Gayvert KM, Dardenne E, Cheung C, Boland MR, Lorberbaum T, Wanjala J, et al. A computational drug repositioning approach for targeting oncogenic transcription factors. Cell Rep. 2016;15(11):2348–56. doi: 10.1016/j.celrep.2016.05.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Grenier L, Hu P. Computational drug repurposing for inflammatory bowel disease using genetic information. Comput Struct Biotechnol J. 2019;17:127–35. doi: 10.1016/j.csbj.2019.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cohen T, Widdows D, Schvaneveldt RW, Davies P, Rindflesch TC. Discovering discovery patterns with Predication-based Semantic Indexing. J Biomed Inform. 2012;45(6):1049–65. doi: 10.1016/j.jbi.2012.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Tan F, Yang R, Xu X, Chen X, Wang Y, Ma H, et al. Drug repositioning by applying ‘expression profiles’ generated by integrating chemical structure similarity and gene semantic similarity. Mol Biosyst. 2014;10(5):1126–38. doi: 10.1039/c3mb70554d. [DOI] [PubMed] [Google Scholar]
  • 16.Zhao M, Yang CC, editors. Automated Off-label Drug Use Detection from User Generated Content the 8th ACM International Conference. New York, New York, USA: ACM Press; 2017 2017/08/20/. [Google Scholar]
  • 17.Zhao Q-Q, Li X, Luo L-P, Qian Y, Liu Y-L, Wu H-T. Repurposing of Approved Cardiovascular Drugs against Ischemic Cerebrovascular Disease by Disease-Disease Associated Network-Assisted Prediction. Chem Pharm Bull. 2019;67(1):32–40. doi: 10.1248/cpb.c18-00634. [DOI] [PubMed] [Google Scholar]
  • 18.Lotfi Shahreza M, Ghadiri N, Mousavi SR, Varshosaz J, Green JR. Heter-LP: A heterogeneous label propagation algorithm and its application in drug repositioning. J Biomed Inform. 2017;68:167–83. doi: 10.1016/j.jbi.2017.03.006. [DOI] [PubMed] [Google Scholar]
  • 19.Yan C-K, Wang W-X, Zhang G, Wang J-L, Patel A. Birwdda: A novel drug repositioning method based on multisimilarity fusion. J Comput Biol. 2019. [DOI] [PubMed]
  • 20.Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018;46(D1):D1074–D82. doi: 10.1093/nar/gkx1037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kanehisa M, Sato Y, Furumichi M, Morishima K, Tanabe M. New approach for understanding genome variations in KEGG. Nucleic Acids Res. 2019;47(D1):D590–D5. doi: 10.1093/nar/gky962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Davis AP, Grondin CJ, Johnson RJ, Sciaky D, McMorran R, Wiegers J, et al. The Comparative Toxicogenomics Database: update 2019. Nucleic Acids Res. 2019;47(D1):D948–D54. doi: 10.1093/nar/gky868. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Peng L, Zhu W, Liao B, Duan Y, Chen M, Chen Y, et al. Screening drug-target interactions with positive-unlabeled learning. Sci Rep. 2017;7(1):8087. doi: 10.1038/s41598-017-08079-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Gottlieb A, Stein GY, Ruppin E, Sharan R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol. 2011;7:496. doi: 10.1038/msb.2011.26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics. 2008;24(13):i232–40. doi: 10.1093/bioinformatics/btn162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Keum J, Nam H. SELF-BLM: Prediction of drug-target interactions via self-training SVM. PLoS ONE. 2017;12(2):e0171839. doi: 10.1371/journal.pone.0171839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhang W, Yue X, Huang F, Liu R, Chen Y, Ruan C. Predicting drug-disease associations and their therapeutic function based on the drug-disease association bipartite network. Methods. 2018;145:51–9. doi: 10.1016/j.ymeth.2018.06.001. [DOI] [PubMed] [Google Scholar]
  • 28.Bakal G, Talari P, Kakani EV, Kavuluru R. Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations. J Biomed Inform. 2018;82:189–99. doi: 10.1016/j.jbi.2018.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Cheng F, Desai RJ, Handy DE, Wang R, Schneeweiss S, Barabasi A-Ls, et al. Network-based approach to prediction and population-based validation of in silico drug repurposing. Nat Commun. 2018;9(1):2691. doi: 10.1038/s41467-018-05116-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Grammer AC, Ryals MM, Heuer SE, Robl RD, Madamanchi S, Davis LS, et al. Drug repositioning in SLE: crowd-sourcing, literature-mining and Big Data analysis. Lupus. 2016;25(10):1150–70. doi: 10.1177/0961203316657437. [DOI] [PubMed] [Google Scholar]
  • 31.Chen B, Wei W, Ma L, Yang B, Gill RM, Chua M-S, et al. Computational discovery of niclosamide ethanolamine, a repurposed drug candidate that reduces growth of hepatocellular carcinoma cells in-vitro and in mice by inhibiting cell division cycle 37 signaling. Gastroenterology. 2017;152(8):2022–36. doi: 10.1053/j.gastro.2017.02.039. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES